Back to Blog

LLM Integration in Production: OpenAI SDK, Streaming, Function Calling, and Cost Control

Build production LLM integrations in 2026 — OpenAI SDK streaming responses, function calling with typed tools, prompt versioning, token cost estimation, retry w

Viprasol Tech Team
June 18, 2026
13 min read

LLM Integration in Production: OpenAI SDK, Streaming, Function Calling, and Cost Control

Integrating LLMs into a production SaaS product is more complex than calling the OpenAI API in a demo. Production concerns include: streaming for perceived performance, function calling for structured outputs, prompt versioning, retry logic for rate limits, cost monitoring, and observability.


The OpenAI SDK Setup

// lib/openai.ts
import OpenAI from 'openai';

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 3,       // Automatic retry on 429/5xx
  timeout: 30_000,     // 30s timeout per request
});

// Model reference — centralize to make upgrades easy
export const MODELS = {
  fast: 'gpt-4o-mini',        // Cheap, fast — routing, classification, short tasks
  smart: 'gpt-4o',             // Expensive, capable — complex reasoning, long context
  embedding: 'text-embedding-3-small',
} as const;

Streaming Responses

Streaming makes LLM responses feel instant — the user sees tokens as they're generated rather than waiting for the full response:

// app/api/chat/route.ts — Next.js streaming route
import { openai, MODELS } from '@/lib/openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';  // Vercel AI SDK helper

export async function POST(req: Request) {
  const { messages, systemPrompt } = await req.json();

  const response = await openai.chat.completions.create({
    model: MODELS.smart,
    messages: [
      { role: 'system', content: systemPrompt },
      ...messages,
    ],
    stream: true,
    max_tokens: 1000,
    temperature: 0.7,
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}
// Alternative: manual streaming without Vercel AI SDK
export async function POST(req: Request) {
  const stream = new TransformStream();
  const writer = stream.writable.getWriter();
  const encoder = new TextEncoder();

  const completion = await openai.chat.completions.create({
    model: MODELS.smart,
    messages: [{ role: 'user', content: req.json().then(b => b.message) }],
    stream: true,
  });

  (async () => {
    for await (const chunk of completion) {
      const content = chunk.choices[0]?.delta?.content ?? '';
      if (content) {
        await writer.write(encoder.encode(`data: ${JSON.stringify({ content })}\n\n`));
      }
    }
    await writer.write(encoder.encode('data: [DONE]\n\n'));
    await writer.close();
  })();

  return new Response(stream.readable, {
    headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache' },
  });
}

Client-side streaming consumption:

// components/ChatInterface.tsx
'use client';
import { useChat } from 'ai/react';  // Vercel AI SDK hook

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  });

  return (
    <div>
      <div className="messages">
        {messages.map(m => (
          <div key={m.id} className={m.role === 'user' ? 'user' : 'assistant'}>
            {m.content}
          </div>
        ))}
        {isLoading && <div className="typing-indicator">...</div>}
      </div>

      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} placeholder="Ask something..." />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

🤖 AI Is Not the Future — It Is Right Now

Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.

  • LLM integration (OpenAI, Anthropic, Gemini, local models)
  • RAG systems that answer from your own data
  • AI agents that take real actions — not just chat
  • Custom ML models for prediction, classification, detection

Function Calling (Tool Use)

Function calling lets the LLM invoke structured tools — returning typed JSON instead of prose:

// lib/llm-tools.ts
import { openai, MODELS } from './openai';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

// Define a typed tool using Zod
const getOrdersTool = {
  name: 'get_orders',
  description: 'Retrieve orders for a customer, optionally filtered by status or date range',
  parameters: zodToJsonSchema(z.object({
    customerId: z.string().describe('Customer UUID'),
    status: z.enum(['PENDING', 'PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED']).optional(),
    fromDate: z.string().optional().describe('ISO date string e.g. 2026-01-01'),
    toDate: z.string().optional().describe('ISO date string'),
    limit: z.number().min(1).max(50).default(10),
  })),
};

const updateOrderStatusTool = {
  name: 'update_order_status',
  description: 'Update the status of an order (requires manager permissions)',
  parameters: zodToJsonSchema(z.object({
    orderId: z.string().describe('Order UUID'),
    status: z.enum(['PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED']),
    reason: z.string().optional().describe('Reason for status change'),
  })),
};

// Execute tool calls with type safety
async function executeTool(
  toolName: string,
  args: unknown,
  context: { userId: string; tenantId: string },
): Promise<string> {
  switch (toolName) {
    case 'get_orders': {
      const params = z.object({
        customerId: z.string(),
        status: z.string().optional(),
        fromDate: z.string().optional(),
        toDate: z.string().optional(),
        limit: z.number().default(10),
      }).parse(args);

      const orders = await db.orders.findMany({
        where: {
          tenantId: context.tenantId,  // Always scope to tenant
          customerId: params.customerId,
          ...(params.status && { status: params.status }),
          ...(params.fromDate && {
            createdAt: { gte: new Date(params.fromDate) }
          }),
        },
        take: params.limit,
        orderBy: { createdAt: 'desc' },
      });

      return JSON.stringify(orders);
    }

    default:
      throw new Error(`Unknown tool: ${toolName}`);
  }
}

// Agentic loop: run until model stops calling tools
export async function runAgentLoop(
  userMessage: string,
  systemPrompt: string,
  context: { userId: string; tenantId: string },
): Promise<string> {
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: userMessage },
  ];

  const tools = [getOrdersTool, updateOrderStatusTool];

  for (let iteration = 0; iteration < 5; iteration++) {  // Max 5 tool calls
    const response = await openai.chat.completions.create({
      model: MODELS.smart,
      messages,
      tools: tools.map(t => ({ type: 'function', function: t })),
      tool_choice: 'auto',
    });

    const message = response.choices[0].message;
    messages.push(message);

    // No tool calls — model is done
    if (!message.tool_calls?.length) {
      return message.content ?? '';
    }

    // Execute each tool call
    for (const toolCall of message.tool_calls) {
      const args = JSON.parse(toolCall.function.arguments);
      const result = await executeTool(toolCall.function.name, args, context);

      messages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: result,
      });
    }
  }

  return 'Maximum iterations reached';
}

Prompt Management

Prompts are code. Version them, test them, and don't hardcode them:

// lib/prompts.ts
// Centralized prompt registry with versioning

const PROMPTS = {
  'support-assistant': {
    v1: `You are a helpful customer support assistant for Viprasol.
Answer questions about orders, billing, and account settings.
Be concise and friendly. If you don't know, say so honestly.
Never make up information about order status or billing.`,

    v2: `You are a customer support specialist for Viprasol's order management platform.
CAPABILITIES: Look up orders, check status, explain billing
CONSTRAINTS: Never disclose other customers' data. Never promise refunds without escalating.
TONE: Professional, empathetic, solution-focused
If unable to help, say: "Let me connect you with our support team at support@viprasol.com"`,
  },
} as const;

export function getPrompt(name: keyof typeof PROMPTS, version: 'v1' | 'v2' = 'v2'): string {
  return PROMPTS[name][version];
}

⚡ Your Competitors Are Already Using AI — Are You?

We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.

  • AI agent systems that run autonomously — not just chatbots
  • Integrates with your existing tools (CRM, ERP, Slack, etc.)
  • Explainable outputs — know why the model decided what it did
  • Free AI opportunity audit for your business

Cost Control and Token Estimation

// lib/token-cost.ts
// 2026 pricing (approximate — check openai.com/pricing)
const COSTS_PER_1M_TOKENS = {
  'gpt-4o': { input: 2.50, output: 10.00 },
  'gpt-4o-mini': { input: 0.15, output: 0.60 },
  'text-embedding-3-small': { input: 0.02, output: 0 },
} as const;

export function estimateCost(
  model: keyof typeof COSTS_PER_1M_TOKENS,
  inputTokens: number,
  outputTokens: number,
): number {
  const pricing = COSTS_PER_1M_TOKENS[model];
  return (
    (inputTokens / 1_000_000) * pricing.input +
    (outputTokens / 1_000_000) * pricing.output
  );
}

// Track actual usage from API response
export async function trackLLMUsage(
  model: string,
  usage: OpenAI.CompletionUsage,
  metadata: { userId: string; feature: string },
): Promise<void> {
  const cost = estimateCost(
    model as any,
    usage.prompt_tokens,
    usage.completion_tokens,
  );

  await db.llmUsage.create({
    data: {
      userId: metadata.userId,
      feature: metadata.feature,
      model,
      inputTokens: usage.prompt_tokens,
      outputTokens: usage.completion_tokens,
      estimatedCostCents: Math.round(cost * 100),
      createdAt: new Date(),
    },
  });
}

// Per-user cost limits
export async function checkUserCostLimit(
  userId: string,
  limitCentsPerDay: number = 100,  // $1.00/day default
): Promise<boolean> {
  const today = new Date();
  today.setHours(0, 0, 0, 0);

  const todaySpend = await db.llmUsage.aggregate({
    where: { userId, createdAt: { gte: today } },
    _sum: { estimatedCostCents: true },
  });

  return (todaySpend._sum.estimatedCostCents ?? 0) < limitCentsPerDay;
}

Rate Limit Handling

OpenAI rate limits are hit in production. Handle them gracefully:

// lib/llm-client.ts
import OpenAI from 'openai';

export async function completionWithRetry(
  params: OpenAI.Chat.ChatCompletionCreateParamsNonStreaming,
  retries = 3,
): Promise<OpenAI.Chat.ChatCompletion> {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      return await openai.chat.completions.create(params);
    } catch (err) {
      if (err instanceof OpenAI.RateLimitError) {
        if (attempt === retries - 1) throw err;

        // Exponential backoff: 2s, 4s, 8s
        const delay = 2_000 * Math.pow(2, attempt);
        console.warn(`Rate limited. Retrying in ${delay}ms (attempt ${attempt + 1}/${retries})`);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      if (err instanceof OpenAI.APIConnectionError || err instanceof OpenAI.InternalServerError) {
        if (attempt === retries - 1) throw err;
        await new Promise(resolve => setTimeout(resolve, 1_000 * (attempt + 1)));
        continue;
      }

      throw err;  // Non-retryable errors (auth, invalid request)
    }
  }

  throw new Error('Max retries exceeded');
}

Working With Viprasol

We build production LLM integrations — streaming chat interfaces, agentic tool-calling systems, RAG pipelines, prompt management, cost monitoring, and LLM observability. AI features ship reliably when the engineering is right.

Talk to our team about AI and LLM integration for your product.


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Want to Implement AI in Your Business?

From chatbots to predictive models — harness the power of AI with a team that delivers.

Free consultation • No commitment • Response within 24 hours

Viprasol · AI Agent Systems

Ready to automate your business with AI agents?

We build custom multi-agent AI systems that handle sales, support, ops, and content — across Telegram, WhatsApp, Slack, and 20+ other platforms. We run our own business on these systems.