ChatGPT API Integration: Building Production AI Features with OpenAI

Integrating OpenAI's API into a production application is straightforward for a demo and genuinely complex at scale. Rate limits, cost control, streaming UX, context management, and prompt engineering all matter in ways that don't show up until you're handling real traffic.

This guide covers the complete production integration: API setup, streaming responses, function calling, RAG-based context injection, cost management, and the architectural patterns that prevent the AI layer from becoming a liability.

API Setup and Client Abstraction

The first architectural decision: never call the OpenAI API inline throughout your codebase. Centralize it behind an abstraction layer that gives you a single place to add retry logic, cost tracking, model switching, and fallbacks.

// lib/ai/client.ts — centralized LLM client
import OpenAI from 'openai';
import { z } from 'zod';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  timeout: 30000,    // 30-second timeout
  maxRetries: 2,     // Built-in retry on 429/500
});

interface ChatOptions {
  model?: 'gpt-4o' | 'gpt-4o-mini' | 'gpt-4-turbo';
  temperature?: number;
  maxTokens?: number;
  systemPrompt?: string;
  userId?: string;  // For cost attribution
}

interface ChatResult {
  content: string;
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
    estimatedCostUsd: number;
  };
}

// Token cost table (April 2026)
const COST_PER_1K: Record<string, { input: number; output: number }> = {
  'gpt-4o':       { input: 0.0025, output: 0.01 },
  'gpt-4o-mini':  { input: 0.00015, output: 0.0006 },
  'gpt-4-turbo':  { input: 0.01, output: 0.03 },
};

export async function chat(
  messages: OpenAI.Chat.ChatCompletionMessageParam[],
  options: ChatOptions = {}
): Promise<ChatResult> {
  const model = options.model ?? 'gpt-4o-mini';
  
  const response = await openai.chat.completions.create({
    model,
    messages: [
      ...(options.systemPrompt ? [{ role: 'system' as const, content: options.systemPrompt }] : []),
      ...messages,
    ],
    temperature: options.temperature ?? 0.7,
    max_tokens: options.maxTokens ?? 1024,
  });

  const usage = response.usage!;
  const costs = COST_PER_1K[model];
  const estimatedCostUsd = 
    (usage.prompt_tokens / 1000) * costs.input +
    (usage.completion_tokens / 1000) * costs.output;

  // Track usage asynchronously (don't block response)
  trackUsage({
    model,
    promptTokens: usage.prompt_tokens,
    completionTokens: usage.completion_tokens,
    costUsd: estimatedCostUsd,
    userId: options.userId,
  }).catch(console.error);

  return {
    content: response.choices[0].message.content ?? '',
    usage: {
      promptTokens: usage.prompt_tokens,
      completionTokens: usage.completion_tokens,
      totalTokens: usage.total_tokens,
      estimatedCostUsd,
    },
  };
}

Streaming Responses

For chat interfaces, streaming is essential. A 3-second wait for a response feels broken; streamed tokens appearing within 200ms feels fast even if total time is the same.

Server-Side Streaming (Node.js + SSE)

// API route: streaming chat endpoint
app.post('/api/chat/stream', authenticate, async (req, res) => {
  const { messages, conversationId } = req.body;

  // Set SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('X-Accel-Buffering', 'no');

  try {
    const stream = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: await buildMessages(conversationId, messages, req.user.sub),
      stream: true,
      max_tokens: 1024,
    });

    let fullContent = '';

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta?.content;
      if (delta) {
        fullContent += delta;
        res.write(`data: ${JSON.stringify({ content: delta })}\n\n`);
      }
    }

    // Save complete response to conversation history
    await saveAssistantMessage(conversationId, fullContent);

    res.write('data: [DONE]\n\n');
    res.end();
  } catch (err: any) {
    res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
    res.end();
  }
});

Client-Side Streaming (React)

function ChatInterface({ conversationId }: { conversationId: string }) {
  const [messages, setMessages] = useState<Message[]>([]);
  const [streaming, setStreaming] = useState(false);

  const sendMessage = async (userInput: string) => {
    const userMsg: Message = { role: 'user', content: userInput };
    setMessages(prev => [...prev, userMsg]);
    setStreaming(true);

    // Add empty assistant message to stream into
    setMessages(prev => [...prev, { role: 'assistant', content: '' }]);

    const response = await fetch('/api/chat/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: [userMsg], conversationId }),
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      const lines = text.split('\n').filter(line => line.startsWith('data: '));

      for (const line of lines) {
        const data = line.slice(6);
        if (data === '[DONE]') continue;

        const { content } = JSON.parse(data);
        if (content) {
          setMessages(prev => {
            const updated = [...prev];
            updated[updated.length - 1] = {
              role: 'assistant',
              content: updated[updated.length - 1].content + content,
            };
            return updated;
          });
        }
      }
    }

    setStreaming(false);
  };

  return (
    <div>
      <MessageList messages={messages} />
      <ChatInput onSend={sendMessage} disabled={streaming} />
    </div>
  );
}

🤖 AI Is Not the Future — It Is Right Now

Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.

LLM integration (OpenAI, Anthropic, Gemini, local models)
RAG systems that answer from your own data
AI agents that take real actions — not just chat
Custom ML models for prediction, classification, detection

Explore AI for My Business WhatsApp

Function Calling

Function calling lets the model invoke structured tools — database queries, API calls, calculations — with type-safe arguments.

// Define tools the model can call
const tools: OpenAI.Chat.ChatCompletionTool[] = [
  {
    type: 'function',
    function: {
      name: 'search_products',
      description: 'Search the product catalog by keyword, category, or price range',
      parameters: {
        type: 'object',
        properties: {
          query: { type: 'string', description: 'Search query' },
          category: { type: 'string', enum: ['electronics', 'clothing', 'books', 'home'] },
          maxPrice: { type: 'number', description: 'Maximum price in USD' },
          limit: { type: 'number', description: 'Number of results (1-20)', default: 5 },
        },
        required: ['query'],
      },
    },
  },
  {
    type: 'function',
    function: {
      name: 'get_order_status',
      description: "Get the status of a customer's order",
      parameters: {
        type: 'object',
        properties: {
          orderId: { type: 'string', description: 'Order ID' },
        },
        required: ['orderId'],
      },
    },
  },
];

// Agentic loop: model calls tools, results fed back, model continues
async function agentChat(userMessage: string, userId: string): Promise<string> {
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: 'user', content: userMessage },
  ];

  while (true) {
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages,
      tools,
      tool_choice: 'auto',
    });

    const choice = response.choices[0];
    messages.push(choice.message);

    if (choice.finish_reason === 'stop') {
      return choice.message.content ?? '';
    }

    if (choice.finish_reason === 'tool_calls') {
      for (const toolCall of choice.message.tool_calls!) {
        const args = JSON.parse(toolCall.function.arguments);
        let result: unknown;

        switch (toolCall.function.name) {
          case 'search_products':
            result = await searchProducts(args, userId);
            break;
          case 'get_order_status':
            result = await getOrderStatus(args.orderId, userId);
            break;
          default:
            result = { error: 'Unknown function' };
        }

        messages.push({
          role: 'tool',
          tool_call_id: toolCall.id,
          content: JSON.stringify(result),
        });
      }
    }
  }
}

RAG: Retrieval-Augmented Generation

For domain-specific knowledge (your docs, your data), RAG retrieves relevant context before generating a response.

import { OpenAI } from 'openai';
import { PGVectorStore } from '@langchain/community/vectorstores/pgvector';
import { OpenAIEmbeddings } from '@langchain/openai';

const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });

const vectorStore = await PGVectorStore.initialize(embeddings, {
  postgresConnectionOptions: { connectionString: process.env.DATABASE_URL },
  tableName: 'document_embeddings',
  columns: {
    idColumnName: 'id',
    vectorColumnName: 'embedding',
    contentColumnName: 'content',
    metadataColumnName: 'metadata',
  },
});

async function ragChat(query: string, userId: string): Promise<string> {
  // 1. Retrieve relevant documents
  const relevantDocs = await vectorStore.similaritySearch(query, 5);
  
  const context = relevantDocs
    .map((doc, i) => `[Source ${i + 1}]: ${doc.pageContent}`)
    .join('\n\n');

  // 2. Generate response with context
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant. Answer questions using the provided context.
If the answer isn't in the context, say so — don't make up information.

Context:
${context}`,
      },
      { role: 'user', content: query },
    ],
    temperature: 0.3,  // Lower temperature for factual responses
    max_tokens: 800,
  });

  return response.choices[0].message.content ?? '';
}

⚡ Your Competitors Are Already Using AI — Are You?

We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.

AI agent systems that run autonomously — not just chatbots
Integrates with your existing tools (CRM, ERP, Slack, etc.)
Explainable outputs — know why the model decided what it did
Free AI opportunity audit for your business

Get a Free AI Audit WhatsApp

Cost Control

OpenAI costs can grow unexpectedly. Implement guards:

// Per-user daily spend limit
async function checkUserSpendLimit(userId: string, estimatedCost: number): Promise<void> {
  const todaySpend = await db('ai_usage')
    .where({ user_id: userId })
    .where('created_at', '>=', startOfDay(new Date()))
    .sum('cost_usd as total')
    .first();

  const dailyLimit = 1.00; // $1.00 per user per day (adjust per plan)
  
  if (Number(todaySpend?.total ?? 0) + estimatedCost > dailyLimit) {
    throw new Error('Daily AI usage limit reached. Resets at midnight UTC.');
  }
}

// Prompt truncation to prevent runaway token usage
function truncateMessages(
  messages: OpenAI.Chat.ChatCompletionMessageParam[],
  maxTokens: number = 3000
): OpenAI.Chat.ChatCompletionMessageParam[] {
  // Keep system message + recent messages within token budget
  // Simple heuristic: ~4 chars per token
  let totalChars = 0;
  const result: typeof messages = [];

  for (let i = messages.length - 1; i >= 0; i--) {
    const content = typeof messages[i].content === 'string' 
      ? messages[i].content as string 
      : '';
    totalChars += content.length;
    
    if (totalChars / 4 > maxTokens) break;
    result.unshift(messages[i]);
  }

  return result;
}

Model Selection Guide (2026)

Use Case	Recommended Model	Cost/1M tokens
Simple Q&A, summaries, classification	gpt-4o-mini	$0.15 input / $0.60 output
Complex reasoning, multi-step tasks	gpt-4o	$2.50 input / $10 output
Long context (>100k tokens)	gpt-4-turbo	$10 input / $30 output
Embeddings (RAG)	text-embedding-3-small	$0.02 / 1M tokens
Image understanding	gpt-4o	$2.50 + $0.00765/image

Default to gpt-4o-mini — it handles 80–90% of use cases at 1/17th the cost of gpt-4o. Upgrade to gpt-4o only when quality genuinely requires it.

Implementation Costs

Scope	Timeline	Investment
Basic OpenAI chat integration	1–2 weeks	$4,000–$10,000
Streaming chat + conversation history	2–3 weeks	$8,000–$18,000
Function calling + tool use	2–4 weeks	$10,000–$25,000
RAG system with vector search	3–6 weeks	$15,000–$40,000
Full AI feature suite	2–4 months	$40,000–$120,000

Working With Viprasol

We integrate OpenAI and other LLM APIs into production applications — from simple chat features through function-calling agents, RAG systems, and AI-powered workflows.

→ AI integration consultation →
→ AI & Machine Learning Services →
→ Generative AI Consulting →

ChatGPT API Integration: Building Production AI Features with OpenAI

ChatGPT API Integration: Building Production AI Features with OpenAI

API Setup and Client Abstraction

Streaming Responses

Server-Side Streaming (Node.js + SSE)

Client-Side Streaming (React)

🤖 AI Is Not the Future — It Is Right Now

Function Calling

RAG: Retrieval-Augmented Generation

⚡ Your Competitors Are Already Using AI — Are You?

Cost Control

Model Selection Guide (2026)

Implementation Costs

Working With Viprasol

See Also

Sources

Viprasol Tech Team

Want to Implement AI in Your Business?

Ready to automate your business with AI agents?

Related Articles

OpenAI Assistants API: Threads, File Search, Code Interpreter, and Function Tools

OpenAI Function Calling: Tool Use, Structured Outputs, and Multi-Step Agents

LLM Integration in Production: OpenAI SDK, Streaming, Function Calling, and Cost Control