ChatGPT API Integration: Building Production AI Features with OpenAI
ChatGPT API integration in 2026 — OpenAI API setup, prompt engineering, streaming responses, function calling, RAG implementation, cost control, and production
ChatGPT API Integration: Building Production AI Features with OpenAI
Integrating OpenAI's API into a production application is straightforward for a demo and genuinely complex at scale. Rate limits, cost control, streaming UX, context management, and prompt engineering all matter in ways that don't show up until you're handling real traffic.
This guide covers the complete production integration: API setup, streaming responses, function calling, RAG-based context injection, cost management, and the architectural patterns that prevent the AI layer from becoming a liability.
API Setup and Client Abstraction
The first architectural decision: never call the OpenAI API inline throughout your codebase. Centralize it behind an abstraction layer that gives you a single place to add retry logic, cost tracking, model switching, and fallbacks.
// lib/ai/client.ts — centralized LLM client
import OpenAI from 'openai';
import { z } from 'zod';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
timeout: 30000, // 30-second timeout
maxRetries: 2, // Built-in retry on 429/500
});
interface ChatOptions {
model?: 'gpt-4o' | 'gpt-4o-mini' | 'gpt-4-turbo';
temperature?: number;
maxTokens?: number;
systemPrompt?: string;
userId?: string; // For cost attribution
}
interface ChatResult {
content: string;
usage: {
promptTokens: number;
completionTokens: number;
totalTokens: number;
estimatedCostUsd: number;
};
}
// Token cost table (April 2026)
const COST_PER_1K: Record<string, { input: number; output: number }> = {
'gpt-4o': { input: 0.0025, output: 0.01 },
'gpt-4o-mini': { input: 0.00015, output: 0.0006 },
'gpt-4-turbo': { input: 0.01, output: 0.03 },
};
export async function chat(
messages: OpenAI.Chat.ChatCompletionMessageParam[],
options: ChatOptions = {}
): Promise<ChatResult> {
const model = options.model ?? 'gpt-4o-mini';
const response = await openai.chat.completions.create({
model,
messages: [
...(options.systemPrompt ? [{ role: 'system' as const, content: options.systemPrompt }] : []),
...messages,
],
temperature: options.temperature ?? 0.7,
max_tokens: options.maxTokens ?? 1024,
});
const usage = response.usage!;
const costs = COST_PER_1K[model];
const estimatedCostUsd =
(usage.prompt_tokens / 1000) * costs.input +
(usage.completion_tokens / 1000) * costs.output;
// Track usage asynchronously (don't block response)
trackUsage({
model,
promptTokens: usage.prompt_tokens,
completionTokens: usage.completion_tokens,
costUsd: estimatedCostUsd,
userId: options.userId,
}).catch(console.error);
return {
content: response.choices[0].message.content ?? '',
usage: {
promptTokens: usage.prompt_tokens,
completionTokens: usage.completion_tokens,
totalTokens: usage.total_tokens,
estimatedCostUsd,
},
};
}
Streaming Responses
For chat interfaces, streaming is essential. A 3-second wait for a response feels broken; streamed tokens appearing within 200ms feels fast even if total time is the same.
Server-Side Streaming (Node.js + SSE)
// API route: streaming chat endpoint
app.post('/api/chat/stream', authenticate, async (req, res) => {
const { messages, conversationId } = req.body;
// Set SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
res.setHeader('X-Accel-Buffering', 'no');
try {
const stream = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: await buildMessages(conversationId, messages, req.user.sub),
stream: true,
max_tokens: 1024,
});
let fullContent = '';
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
fullContent += delta;
res.write(`data: ${JSON.stringify({ content: delta })}\n\n`);
}
}
// Save complete response to conversation history
await saveAssistantMessage(conversationId, fullContent);
res.write('data: [DONE]\n\n');
res.end();
} catch (err: any) {
res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
res.end();
}
});
Client-Side Streaming (React)
function ChatInterface({ conversationId }: { conversationId: string }) {
const [messages, setMessages] = useState<Message[]>([]);
const [streaming, setStreaming] = useState(false);
const sendMessage = async (userInput: string) => {
const userMsg: Message = { role: 'user', content: userInput };
setMessages(prev => [...prev, userMsg]);
setStreaming(true);
// Add empty assistant message to stream into
setMessages(prev => [...prev, { role: 'assistant', content: '' }]);
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: [userMsg], conversationId }),
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') continue;
const { content } = JSON.parse(data);
if (content) {
setMessages(prev => {
const updated = [...prev];
updated[updated.length - 1] = {
role: 'assistant',
content: updated[updated.length - 1].content + content,
};
return updated;
});
}
}
}
setStreaming(false);
};
return (
<div>
<MessageList messages={messages} />
<ChatInput onSend={sendMessage} disabled={streaming} />
</div>
);
}
🤖 AI Is Not the Future — It Is Right Now
Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.
- LLM integration (OpenAI, Anthropic, Gemini, local models)
- RAG systems that answer from your own data
- AI agents that take real actions — not just chat
- Custom ML models for prediction, classification, detection
Function Calling
Function calling lets the model invoke structured tools — database queries, API calls, calculations — with type-safe arguments.
// Define tools the model can call
const tools: OpenAI.Chat.ChatCompletionTool[] = [
{
type: 'function',
function: {
name: 'search_products',
description: 'Search the product catalog by keyword, category, or price range',
parameters: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
category: { type: 'string', enum: ['electronics', 'clothing', 'books', 'home'] },
maxPrice: { type: 'number', description: 'Maximum price in USD' },
limit: { type: 'number', description: 'Number of results (1-20)', default: 5 },
},
required: ['query'],
},
},
},
{
type: 'function',
function: {
name: 'get_order_status',
description: "Get the status of a customer's order",
parameters: {
type: 'object',
properties: {
orderId: { type: 'string', description: 'Order ID' },
},
required: ['orderId'],
},
},
},
];
// Agentic loop: model calls tools, results fed back, model continues
async function agentChat(userMessage: string, userId: string): Promise<string> {
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
{ role: 'user', content: userMessage },
];
while (true) {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
tools,
tool_choice: 'auto',
});
const choice = response.choices[0];
messages.push(choice.message);
if (choice.finish_reason === 'stop') {
return choice.message.content ?? '';
}
if (choice.finish_reason === 'tool_calls') {
for (const toolCall of choice.message.tool_calls!) {
const args = JSON.parse(toolCall.function.arguments);
let result: unknown;
switch (toolCall.function.name) {
case 'search_products':
result = await searchProducts(args, userId);
break;
case 'get_order_status':
result = await getOrderStatus(args.orderId, userId);
break;
default:
result = { error: 'Unknown function' };
}
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
}
}
}
RAG: Retrieval-Augmented Generation
For domain-specific knowledge (your docs, your data), RAG retrieves relevant context before generating a response.
import { OpenAI } from 'openai';
import { PGVectorStore } from '@langchain/community/vectorstores/pgvector';
import { OpenAIEmbeddings } from '@langchain/openai';
const embeddings = new OpenAIEmbeddings({ model: 'text-embedding-3-small' });
const vectorStore = await PGVectorStore.initialize(embeddings, {
postgresConnectionOptions: { connectionString: process.env.DATABASE_URL },
tableName: 'document_embeddings',
columns: {
idColumnName: 'id',
vectorColumnName: 'embedding',
contentColumnName: 'content',
metadataColumnName: 'metadata',
},
});
async function ragChat(query: string, userId: string): Promise<string> {
// 1. Retrieve relevant documents
const relevantDocs = await vectorStore.similaritySearch(query, 5);
const context = relevantDocs
.map((doc, i) => `[Source ${i + 1}]: ${doc.pageContent}`)
.join('\n\n');
// 2. Generate response with context
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `You are a helpful assistant. Answer questions using the provided context.
If the answer isn't in the context, say so — don't make up information.
Context:
${context}`,
},
{ role: 'user', content: query },
],
temperature: 0.3, // Lower temperature for factual responses
max_tokens: 800,
});
return response.choices[0].message.content ?? '';
}
⚡ Your Competitors Are Already Using AI — Are You?
We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.
- AI agent systems that run autonomously — not just chatbots
- Integrates with your existing tools (CRM, ERP, Slack, etc.)
- Explainable outputs — know why the model decided what it did
- Free AI opportunity audit for your business
Cost Control
OpenAI costs can grow unexpectedly. Implement guards:
// Per-user daily spend limit
async function checkUserSpendLimit(userId: string, estimatedCost: number): Promise<void> {
const todaySpend = await db('ai_usage')
.where({ user_id: userId })
.where('created_at', '>=', startOfDay(new Date()))
.sum('cost_usd as total')
.first();
const dailyLimit = 1.00; // $1.00 per user per day (adjust per plan)
if (Number(todaySpend?.total ?? 0) + estimatedCost > dailyLimit) {
throw new Error('Daily AI usage limit reached. Resets at midnight UTC.');
}
}
// Prompt truncation to prevent runaway token usage
function truncateMessages(
messages: OpenAI.Chat.ChatCompletionMessageParam[],
maxTokens: number = 3000
): OpenAI.Chat.ChatCompletionMessageParam[] {
// Keep system message + recent messages within token budget
// Simple heuristic: ~4 chars per token
let totalChars = 0;
const result: typeof messages = [];
for (let i = messages.length - 1; i >= 0; i--) {
const content = typeof messages[i].content === 'string'
? messages[i].content as string
: '';
totalChars += content.length;
if (totalChars / 4 > maxTokens) break;
result.unshift(messages[i]);
}
return result;
}
Model Selection Guide (2026)
| Use Case | Recommended Model | Cost/1M tokens |
|---|---|---|
| Simple Q&A, summaries, classification | gpt-4o-mini | $0.15 input / $0.60 output |
| Complex reasoning, multi-step tasks | gpt-4o | $2.50 input / $10 output |
| Long context (>100k tokens) | gpt-4-turbo | $10 input / $30 output |
| Embeddings (RAG) | text-embedding-3-small | $0.02 / 1M tokens |
| Image understanding | gpt-4o | $2.50 + $0.00765/image |
Default to gpt-4o-mini — it handles 80–90% of use cases at 1/17th the cost of gpt-4o. Upgrade to gpt-4o only when quality genuinely requires it.
Implementation Costs
| Scope | Timeline | Investment |
|---|---|---|
| Basic OpenAI chat integration | 1–2 weeks | $4,000–$10,000 |
| Streaming chat + conversation history | 2–3 weeks | $8,000–$18,000 |
| Function calling + tool use | 2–4 weeks | $10,000–$25,000 |
| RAG system with vector search | 3–6 weeks | $15,000–$40,000 |
| Full AI feature suite | 2–4 months | $40,000–$120,000 |
Working With Viprasol
We integrate OpenAI and other LLM APIs into production applications — from simple chat features through function-calling agents, RAG systems, and AI-powered workflows.
→ AI integration consultation →
→ AI & Machine Learning Services →
→ Generative AI Consulting →
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Want to Implement AI in Your Business?
From chatbots to predictive models — harness the power of AI with a team that delivers.
Free consultation • No commitment • Response within 24 hours
Ready to automate your business with AI agents?
We build custom multi-agent AI systems that handle sales, support, ops, and content — across Telegram, WhatsApp, Slack, and 20+ other platforms. We run our own business on these systems.