LLM Integration in Production: OpenAI SDK, Streaming, Function Calling, and Cost Control

Integrating LLMs into a production SaaS product is more complex than calling the OpenAI API in a demo. Production concerns include: streaming for perceived performance, function calling for structured outputs, prompt versioning, retry logic for rate limits, cost monitoring, and observability.

The OpenAI SDK Setup

// lib/openai.ts
import OpenAI from 'openai';

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 3,       // Automatic retry on 429/5xx
  timeout: 30_000,     // 30s timeout per request
});

// Model reference — centralize to make upgrades easy
export const MODELS = {
  fast: 'gpt-4o-mini',        // Cheap, fast — routing, classification, short tasks
  smart: 'gpt-4o',             // Expensive, capable — complex reasoning, long context
  embedding: 'text-embedding-3-small',
} as const;

Streaming Responses

Streaming makes LLM responses feel instant — the user sees tokens as they're generated rather than waiting for the full response:

// app/api/chat/route.ts — Next.js streaming route
import { openai, MODELS } from '@/lib/openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';  // Vercel AI SDK helper

export async function POST(req: Request) {
  const { messages, systemPrompt } = await req.json();

  const response = await openai.chat.completions.create({
    model: MODELS.smart,
    messages: [
      { role: 'system', content: systemPrompt },
      ...messages,
    ],
    stream: true,
    max_tokens: 1000,
    temperature: 0.7,
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}

// Alternative: manual streaming without Vercel AI SDK
export async function POST(req: Request) {
  const stream = new TransformStream();
  const writer = stream.writable.getWriter();
  const encoder = new TextEncoder();

  const completion = await openai.chat.completions.create({
    model: MODELS.smart,
    messages: [{ role: 'user', content: req.json().then(b => b.message) }],
    stream: true,
  });

  (async () => {
    for await (const chunk of completion) {
      const content = chunk.choices[0]?.delta?.content ?? '';
      if (content) {
        await writer.write(encoder.encode(`data: ${JSON.stringify({ content })}\n\n`));
      }
    }
    await writer.write(encoder.encode('data: [DONE]\n\n'));
    await writer.close();
  })();

  return new Response(stream.readable, {
    headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache' },
  });
}

Client-side streaming consumption:

// components/ChatInterface.tsx
'use client';
import { useChat } from 'ai/react';  // Vercel AI SDK hook

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  });

  return (
    <div>
      <div className="messages">
        {messages.map(m => (
          <div key={m.id} className={m.role === 'user' ? 'user' : 'assistant'}>
            {m.content}
          </div>
        ))}
        {isLoading && <div className="typing-indicator">...</div>}
      </div>

      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} placeholder="Ask something..." />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

🤖 AI Is Not the Future — It Is Right Now

Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.

LLM integration (OpenAI, Anthropic, Gemini, local models)
RAG systems that answer from your own data
AI agents that take real actions — not just chat
Custom ML models for prediction, classification, detection

Explore AI for My Business WhatsApp

Function Calling (Tool Use)

Function calling lets the LLM invoke structured tools — returning typed JSON instead of prose:

// lib/llm-tools.ts
import { openai, MODELS } from './openai';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

// Define a typed tool using Zod
const getOrdersTool = {
  name: 'get_orders',
  description: 'Retrieve orders for a customer, optionally filtered by status or date range',
  parameters: zodToJsonSchema(z.object({
    customerId: z.string().describe('Customer UUID'),
    status: z.enum(['PENDING', 'PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED']).optional(),
    fromDate: z.string().optional().describe('ISO date string e.g. 2026-01-01'),
    toDate: z.string().optional().describe('ISO date string'),
    limit: z.number().min(1).max(50).default(10),
  })),
};

const updateOrderStatusTool = {
  name: 'update_order_status',
  description: 'Update the status of an order (requires manager permissions)',
  parameters: zodToJsonSchema(z.object({
    orderId: z.string().describe('Order UUID'),
    status: z.enum(['PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED']),
    reason: z.string().optional().describe('Reason for status change'),
  })),
};

// Execute tool calls with type safety
async function executeTool(
  toolName: string,
  args: unknown,
  context: { userId: string; tenantId: string },
): Promise<string> {
  switch (toolName) {
    case 'get_orders': {
      const params = z.object({
        customerId: z.string(),
        status: z.string().optional(),
        fromDate: z.string().optional(),
        toDate: z.string().optional(),
        limit: z.number().default(10),
      }).parse(args);

      const orders = await db.orders.findMany({
        where: {
          tenantId: context.tenantId,  // Always scope to tenant
          customerId: params.customerId,
          ...(params.status && { status: params.status }),
          ...(params.fromDate && {
            createdAt: { gte: new Date(params.fromDate) }
          }),
        },
        take: params.limit,
        orderBy: { createdAt: 'desc' },
      });

      return JSON.stringify(orders);
    }

    default:
      throw new Error(`Unknown tool: ${toolName}`);
  }
}

// Agentic loop: run until model stops calling tools
export async function runAgentLoop(
  userMessage: string,
  systemPrompt: string,
  context: { userId: string; tenantId: string },
): Promise<string> {
  const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: userMessage },
  ];

  const tools = [getOrdersTool, updateOrderStatusTool];

  for (let iteration = 0; iteration < 5; iteration++) {  // Max 5 tool calls
    const response = await openai.chat.completions.create({
      model: MODELS.smart,
      messages,
      tools: tools.map(t => ({ type: 'function', function: t })),
      tool_choice: 'auto',
    });

    const message = response.choices[0].message;
    messages.push(message);

    // No tool calls — model is done
    if (!message.tool_calls?.length) {
      return message.content ?? '';
    }

    // Execute each tool call
    for (const toolCall of message.tool_calls) {
      const args = JSON.parse(toolCall.function.arguments);
      const result = await executeTool(toolCall.function.name, args, context);

      messages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: result,
      });
    }
  }

  return 'Maximum iterations reached';
}

Prompt Management

Prompts are code. Version them, test them, and don't hardcode them:

// lib/prompts.ts
// Centralized prompt registry with versioning

const PROMPTS = {
  'support-assistant': {
    v1: `You are a helpful customer support assistant for Viprasol.
Answer questions about orders, billing, and account settings.
Be concise and friendly. If you don't know, say so honestly.
Never make up information about order status or billing.`,

    v2: `You are a customer support specialist for Viprasol's order management platform.
CAPABILITIES: Look up orders, check status, explain billing
CONSTRAINTS: Never disclose other customers' data. Never promise refunds without escalating.
TONE: Professional, empathetic, solution-focused
If unable to help, say: "Let me connect you with our support team at support@viprasol.com"`,
  },
} as const;

export function getPrompt(name: keyof typeof PROMPTS, version: 'v1' | 'v2' = 'v2'): string {
  return PROMPTS[name][version];
}

⚡ Your Competitors Are Already Using AI — Are You?

We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.

AI agent systems that run autonomously — not just chatbots
Integrates with your existing tools (CRM, ERP, Slack, etc.)
Explainable outputs — know why the model decided what it did
Free AI opportunity audit for your business

Get a Free AI Audit WhatsApp

Cost Control and Token Estimation

// lib/token-cost.ts
// 2026 pricing (approximate — check openai.com/pricing)
const COSTS_PER_1M_TOKENS = {
  'gpt-4o': { input: 2.50, output: 10.00 },
  'gpt-4o-mini': { input: 0.15, output: 0.60 },
  'text-embedding-3-small': { input: 0.02, output: 0 },
} as const;

export function estimateCost(
  model: keyof typeof COSTS_PER_1M_TOKENS,
  inputTokens: number,
  outputTokens: number,
): number {
  const pricing = COSTS_PER_1M_TOKENS[model];
  return (
    (inputTokens / 1_000_000) * pricing.input +
    (outputTokens / 1_000_000) * pricing.output
  );
}

// Track actual usage from API response
export async function trackLLMUsage(
  model: string,
  usage: OpenAI.CompletionUsage,
  metadata: { userId: string; feature: string },
): Promise<void> {
  const cost = estimateCost(
    model as any,
    usage.prompt_tokens,
    usage.completion_tokens,
  );

  await db.llmUsage.create({
    data: {
      userId: metadata.userId,
      feature: metadata.feature,
      model,
      inputTokens: usage.prompt_tokens,
      outputTokens: usage.completion_tokens,
      estimatedCostCents: Math.round(cost * 100),
      createdAt: new Date(),
    },
  });
}

// Per-user cost limits
export async function checkUserCostLimit(
  userId: string,
  limitCentsPerDay: number = 100,  // $1.00/day default
): Promise<boolean> {
  const today = new Date();
  today.setHours(0, 0, 0, 0);

  const todaySpend = await db.llmUsage.aggregate({
    where: { userId, createdAt: { gte: today } },
    _sum: { estimatedCostCents: true },
  });

  return (todaySpend._sum.estimatedCostCents ?? 0) < limitCentsPerDay;
}

Rate Limit Handling

OpenAI rate limits are hit in production. Handle them gracefully:

// lib/llm-client.ts
import OpenAI from 'openai';

export async function completionWithRetry(
  params: OpenAI.Chat.ChatCompletionCreateParamsNonStreaming,
  retries = 3,
): Promise<OpenAI.Chat.ChatCompletion> {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      return await openai.chat.completions.create(params);
    } catch (err) {
      if (err instanceof OpenAI.RateLimitError) {
        if (attempt === retries - 1) throw err;

        // Exponential backoff: 2s, 4s, 8s
        const delay = 2_000 * Math.pow(2, attempt);
        console.warn(`Rate limited. Retrying in ${delay}ms (attempt ${attempt + 1}/${retries})`);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      if (err instanceof OpenAI.APIConnectionError || err instanceof OpenAI.InternalServerError) {
        if (attempt === retries - 1) throw err;
        await new Promise(resolve => setTimeout(resolve, 1_000 * (attempt + 1)));
        continue;
      }

      throw err;  // Non-retryable errors (auth, invalid request)
    }
  }

  throw new Error('Max retries exceeded');
}

Working With Viprasol

We build production LLM integrations — streaming chat interfaces, agentic tool-calling systems, RAG pipelines, prompt management, cost monitoring, and LLM observability. AI features ship reliably when the engineering is right.

→ Talk to our team about AI and LLM integration for your product.

LLM Integration in Production: OpenAI SDK, Streaming, Function Calling, and Cost Control

LLM Integration in Production: OpenAI SDK, Streaming, Function Calling, and Cost Control

The OpenAI SDK Setup

Streaming Responses

🤖 AI Is Not the Future — It Is Right Now

Function Calling (Tool Use)

Prompt Management

⚡ Your Competitors Are Already Using AI — Are You?

Cost Control and Token Estimation

Rate Limit Handling

Working With Viprasol

See Also

Viprasol Tech Team

Want to Implement AI in Your Business?

Ready to automate your business with AI agents?

Related Articles

OpenAI Assistants API: Threads, File Search, Code Interpreter, and Function Tools

OpenAI Function Calling: Tool Use, Structured Outputs, and Multi-Step Agents

Prompt Engineering for Developers: Building Reliable AI Features with LLMs