Back to Blog

ChatGPT API Integration: Building Production AI Features with OpenAI

ChatGPT API integration in 2026 — OpenAI API setup, prompt engineering, streaming responses, function calling, RAG implementation, cost control, and production

Viprasol Tech Team
April 6, 2026
12 min read

ChatGPT API Integration: Building Production AI Features with OpenAI

Integrating OpenAI's API into a production application is straightforward for a demo and genuinely complex at scale. Rate limits, cost control, streaming UX, context management, and prompt engineering all matter in ways that don't show up until you're handling real traffic.

This guide covers the complete production integration: API setup, streaming responses, function calling, RAG-based context injection, cost management, and the architectural patterns that prevent the AI layer from becoming a liability.


API Setup and Client Abstraction

The first architectural decision: never call the OpenAI API inline throughout your codebase. Centralize it behind an abstraction layer that gives you a single place to add retry logic, cost tracking, model switching, and fallbacks.

// lib/ai/client.ts — centralized LLM client
import OpenAI from 'openai';
import { z } from 'zod';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  timeout: 30000,    // 30-second timeout
  maxRetries: 2,     // Built-in retry on 429/500
});

interface ChatOptions {
  model?: 'gpt-4o' | 'gpt-4o-mini' | 'gpt-4-turbo';
  temperature?: number;
  maxTokens?: number;
  systemPrompt?: string;
  userId?: string;  // For cost attribution
}

interface ChatResult {
  content: string;
  usage: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
    estimatedCostUsd: number;
  };
}

// Token cost table (April 2026)
const COST_PER_1K: Record<string, { input: number; output: number }> = {
  'gpt-4o':       { input: 0.0025, output: 0.01 },
  'gpt-4o-mini':  { input: 0.00015, output: 0.0006 },
  'gpt-4-turbo':  { input: 0.01, output: 0.03 },
};

export async function chat(
  messages: OpenAI.Chat.ChatCompletionMessageParam[],
  options: ChatOptions = {}
): Promise<ChatResult> {
  const model = options.model ?? 'gpt-4o-mini';
  
  const response = await openai.chat.completions.create({
    model,
    messages: [
      ...(options.systemPrompt ? [{ role: 'system' as const, content: options.systemPrompt }] : []),
      ...messages,
    ],
    temperature: options.temperature ?? 0.7,
    max_tokens: options.maxTokens ?? 1024,
  });

  const usage = response.usage!;
  const costs = COST_PER_1K[model];
  const estimatedCostUsd = 
    (usage.prompt_tokens / 1000) * costs.input +
    (usage.completion_tokens / 1000) * costs.output;

  // Track usage asynchronously (don't block response)
  trackUsage({
    model,
    promptTokens: usage.prompt_tokens,
    completionTokens: usage.completion_tokens,
    costUsd: estimatedCostUsd,
    userId: options.userId,
  }).catch(console.error);

  return {
    content: response.choices[0].message.content ?? '',
    usage: {
      promptTokens: usage.prompt_tokens,
      completionTokens: usage.completion_tokens,
      totalTokens: usage.total_tokens,
      estimatedCostUsd,
    },
  };
}

Streaming Responses

For chat interfaces, streaming is essential. A 3-second wait for a response feels broken; streamed tokens appearing within 200ms feels fast even if total time is the same.

Server-Side Streaming (Node.js + SSE)

// API route: streaming chat endpoint
app.post('/api/chat/stream', authenticate, async (req, res) => {
  const { messages, conversationId } = req.body;

  // Set SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('X-Accel-Buffering', 'no');

  try {
    const stream = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: await buildMessages(conversationId, messages, req.user.sub),
      stream: true,
      max_tokens: 1024,
    });

    let fullContent = '';

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta?.content;
      if (delta) {
        fullContent += delta;
        res.write(`data: ${JSON.stringify({ content: delta })}\n\n`);
      }
    }

    // Save complete response to conversation history
    await saveAssistantMessage(conversationId, fullContent);

    res.write('data: [DONE]\n\n');
    res.end();
  } catch (err: any) {
    res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
    res.end();
  }
});

Client-Side Streaming (React)

function ChatInterface({ conversationId }: { conversationId: string }) {
  const [messages, setMessages] = useState<Message[]>([]);
  const [streaming, setStreaming] = useState(false);

  const sendMessage = async (userInput: string) => {
    const userMsg: Message = { role: 'user', content: userInput };
    setMessages(prev => [...prev, userMsg]);
    setStreaming(true);

    // Add empty assistant message to stream into
    setMessages(prev => [...prev, { role: 'assistant', content: '' }]);

    const response = await fetch('/api/chat/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ messages: [userMsg], conversationId }),
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      const lines = text.split('\n').filter(line => line.startsWith('data: '));

      for (const line of lines) {
        const data = line.slice(6);
        if (data === '[DONE]') continue;

        const { content } = JSON.parse(data);
        if (content) {
          setMessages(prev => {
            const updated = [...prev];
            updated[updated.length - 1] = {
              role: 'assistant',
              content: updated[updated.length - 1].content + content,
            };
            return updated;
          });
        }
      }
    }

    setStreaming(false);
  };

  return (
    <div>
      <MessageList messages={messages} />
      <ChatInput onSend={sendMessage} disabled={streaming} />
    </div>
  );
}

🤖 AI Is Not the Future — It Is Right Now

Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.

  • LLM integration (OpenAI, Anthropic, Gemini, local models)
  • RAG systems that answer from your own data
  • AI agents that take real actions — not just chat
  • Custom ML models for prediction, classification, detection

Function Calling

Function calling lets the model invoke structured tools — database queries, API calls, calculations — with type-safe arguments.

// Define tools the model can call
const tools: OpenAI.Chat.ChatCompletionTool[] = [
  {
    type: 'function',
    function: {
      name: 'search_products',
      description: 'Search the product catalog by keyword, category, or price range',
      parameters: {
        type: 'object',
        properties: {
          query: { type: 'string', description: 'Search query' },
          category: { type: 'string', enum: ['electronics', 'clothing', 'books', 'home'] },
          maxPrice: { type: 'number', description: 'Maximum price in USD' },
          limit: { type: 'number', description: 'Number of results (1-20)', default: 5 },
        },
        required: ['query'],
      },
    },
  },
  {
    type: 'function',
    function: {
      name: 'get_order_status',
      description: "Get the status of a customer's order",
      parameters: {
        type: 'object',
        properties: {
          orderId: { type: 'string', description: 'Order ID' },
        },
        required: ['orderId'],
      },
    },
  },
];

// Agentic loop: model calls tools, results fed back, model continues
async function agentChat(userMessage: string, userId: string): Promise<string> {
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: 'user', content: userMessage },
  ];

  while (true) {
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages,
      tools,
      tool_choice: 'auto',
    });

    const choice = response.choices[0];
    messages.push(choice.message);

    if (choice.finish_reason === 'stop') {
      return choice.message.content ?? '';
    }

    if (choice.finish_reason === 'tool_calls') {
      for (const toolCall of choice.message.tool_calls!) {
        const args = JSON.parse(toolCall.function.arguments);
        let result: unknown;

        switch (toolCall.function.name) {
          case 'search_products':
            result = await searchProducts(args, userId);
            break;
          case 'get_order_status':
            result = await getOrderStatus(args.orderId, userId);
            break;
          default:
            result = { error: 'Unknown function' };
        }

        messages.push({
          role: 'tool',
          tool_call_id: toolCall.id,
          content: JSON.stringify(result),
        });
      }
    }
  }
}

RAG: Retrieval-Augmented Generation

For domain-specific knowledge (your docs, your data), RAG retrieves relevant context before generating a response.

import { OpenAI } from 'openai';
import { PGVectorStore } from '@langchain/community/vectorstores/pgvector';
import { OpenAIEmbeddings } from '@langchain/openai';

const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });

const vectorStore = await PGVectorStore.initialize(embeddings, {
  postgresConnectionOptions: { connectionString: process.env.DATABASE_URL },
  tableName: 'document_embeddings',
  columns: {
    idColumnName: 'id',
    vectorColumnName: 'embedding',
    contentColumnName: 'content',
    metadataColumnName: 'metadata',
  },
});

async function ragChat(query: string, userId: string): Promise<string> {
  // 1. Retrieve relevant documents
  const relevantDocs = await vectorStore.similaritySearch(query, 5);
  
  const context = relevantDocs
    .map((doc, i) => `[Source ${i + 1}]: ${doc.pageContent}`)
    .join('\n\n');

  // 2. Generate response with context
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant. Answer questions using the provided context.
If the answer isn't in the context, say so — don't make up information.

Context:
${context}`,
      },
      { role: 'user', content: query },
    ],
    temperature: 0.3,  // Lower temperature for factual responses
    max_tokens: 800,
  });

  return response.choices[0].message.content ?? '';
}

⚡ Your Competitors Are Already Using AI — Are You?

We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.

  • AI agent systems that run autonomously — not just chatbots
  • Integrates with your existing tools (CRM, ERP, Slack, etc.)
  • Explainable outputs — know why the model decided what it did
  • Free AI opportunity audit for your business

Cost Control

OpenAI costs can grow unexpectedly. Implement guards:

// Per-user daily spend limit
async function checkUserSpendLimit(userId: string, estimatedCost: number): Promise<void> {
  const todaySpend = await db('ai_usage')
    .where({ user_id: userId })
    .where('created_at', '>=', startOfDay(new Date()))
    .sum('cost_usd as total')
    .first();

  const dailyLimit = 1.00; // $1.00 per user per day (adjust per plan)
  
  if (Number(todaySpend?.total ?? 0) + estimatedCost > dailyLimit) {
    throw new Error('Daily AI usage limit reached. Resets at midnight UTC.');
  }
}

// Prompt truncation to prevent runaway token usage
function truncateMessages(
  messages: OpenAI.Chat.ChatCompletionMessageParam[],
  maxTokens: number = 3000
): OpenAI.Chat.ChatCompletionMessageParam[] {
  // Keep system message + recent messages within token budget
  // Simple heuristic: ~4 chars per token
  let totalChars = 0;
  const result: typeof messages = [];

  for (let i = messages.length - 1; i >= 0; i--) {
    const content = typeof messages[i].content === 'string' 
      ? messages[i].content as string 
      : '';
    totalChars += content.length;
    
    if (totalChars / 4 > maxTokens) break;
    result.unshift(messages[i]);
  }

  return result;
}

Model Selection Guide (2026)

Use CaseRecommended ModelCost/1M tokens
Simple Q&A, summaries, classificationgpt-4o-mini$0.15 input / $0.60 output
Complex reasoning, multi-step tasksgpt-4o$2.50 input / $10 output
Long context (>100k tokens)gpt-4-turbo$10 input / $30 output
Embeddings (RAG)text-embedding-3-small$0.02 / 1M tokens
Image understandinggpt-4o$2.50 + $0.00765/image

Default to gpt-4o-mini — it handles 80–90% of use cases at 1/17th the cost of gpt-4o. Upgrade to gpt-4o only when quality genuinely requires it.


Implementation Costs

ScopeTimelineInvestment
Basic OpenAI chat integration1–2 weeks$4,000–$10,000
Streaming chat + conversation history2–3 weeks$8,000–$18,000
Function calling + tool use2–4 weeks$10,000–$25,000
RAG system with vector search3–6 weeks$15,000–$40,000
Full AI feature suite2–4 months$40,000–$120,000

Working With Viprasol

We integrate OpenAI and other LLM APIs into production applications — from simple chat features through function-calling agents, RAG systems, and AI-powered workflows.

AI integration consultation →
AI & Machine Learning Services →
Generative AI Consulting →


Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Want to Implement AI in Your Business?

From chatbots to predictive models — harness the power of AI with a team that delivers.

Free consultation • No commitment • Response within 24 hours

Viprasol · AI Agent Systems

Ready to automate your business with AI agents?

We build custom multi-agent AI systems that handle sales, support, ops, and content — across Telegram, WhatsApp, Slack, and 20+ other platforms. We run our own business on these systems.