Back to Blog

OpenTelemetry in Production: Traces, Metrics, and Logs That Actually Help

Set up OpenTelemetry in Node.js and Python services. Auto-instrumentation, custom spans, OTLP export to Jaeger/Grafana Tempo, and correlating traces with logs a

Viprasol Tech Team
July 7, 2026
14 min read

OpenTelemetry in Production: Traces, Metrics, and Logs That Actually Help

Every distributed system eventually develops the same problem: something is slow or broken, and nobody knows where. "Is it the database?" "Is it the upstream API?" "Why does this request take 3 seconds sometimes?" The answer is usually buried in logs across four services that share no correlation IDs.

OpenTelemetry (OTel) solves this by creating a vendor-neutral, language-agnostic standard for traces, metrics, and logs — the three pillars of observability. In 2026, OTel is stable, widely supported, and the right answer for any team that cares about production reliability.

This post covers a production-ready OTel setup: Node.js auto-instrumentation, custom spans, OTLP export, and correlating the three signals for faster incident resolution.


The Three Signals — and How They Relate

SignalWhat It CapturesBest For
TracesRequest flow across services with timingFinding where latency lives
MetricsAggregated numbers (counters, histograms, gauges)Alerting, dashboards, SLOs
LogsTimestamped text with contextDebugging specific errors

The magic happens when you correlate them: a trace ID in your log entry links the log to the exact span where the error occurred. OTel makes this correlation automatic.


Architecture: OTel Collector as Central Hub

App Services (Node.js, Python, Go)
    │ OTLP/gRPC (4317)
    ▼
OTel Collector  ──── Traces ────► Jaeger / Grafana Tempo
                ──── Metrics ───► Prometheus / Grafana Mimir
                ──── Logs ──────► Loki / Elasticsearch

Never export directly from your app to Jaeger/Prometheus in production — the OTel Collector handles batching, retry, transformation, and routing. Apps emit OTLP; the Collector fans out to backends.


☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Node.js: Auto-Instrumentation Setup

The fastest way to get traces is auto-instrumentation — OTel wraps http, express, pg, redis, axios, and 40+ libraries automatically.

npm install \
  @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-grpc \
  @opentelemetry/exporter-metrics-otlp-grpc \
  @opentelemetry/sdk-metrics
// src/instrumentation.ts — must be imported FIRST before any app code
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: process.env.SERVICE_NAME ?? 'api-service',
    [SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '1.0.0',
    'deployment.environment': process.env.NODE_ENV ?? 'production',
  }),

  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://otel-collector:4317',
  }),

  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://otel-collector:4317',
    }),
    exportIntervalMillis: 15_000, // Every 15 seconds
  }),

  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false }, // Too noisy
      '@opentelemetry/instrumentation-http': {
        ignoreIncomingRequestHook: (req) => {
          // Don't trace health checks
          return req.url === '/health' || req.url === '/ready';
        },
      },
      '@opentelemetry/instrumentation-pg': { enhancedDatabaseReporting: true },
    }),
  ],
});

sdk.start();

// Graceful shutdown
process.on('SIGTERM', () => {
  sdk.shutdown().finally(() => process.exit(0));
});
// src/index.ts — instrumentation must be first import
import './instrumentation';
import Fastify from 'fastify';
// ... rest of app

With this setup, every HTTP request, database query, and Redis operation is traced automatically. No code changes needed in your route handlers.


Custom Spans: Adding Business Context

Auto-instrumentation gives you infrastructure spans. Custom spans add business context — which payment processor was called, which feature flag was evaluated, how many items were in the cart.

// src/lib/tracing.ts
import { trace, context, SpanStatusCode, SpanKind } from '@opentelemetry/api';

const tracer = trace.getTracer('api-service', '1.0.0');

// Wrapper for adding spans to async functions
export async function withSpan<T>(
  name: string,
  fn: () => Promise<T>,
  attributes?: Record<string, string | number | boolean>,
): Promise<T> {
  return tracer.startActiveSpan(name, { attributes }, async (span) => {
    try {
      const result = await fn();
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: String(error) });
      span.recordException(error as Error);
      throw error;
    } finally {
      span.end();
    }
  });
}
// src/services/payment.ts
import { trace, SpanStatusCode } from '@opentelemetry/api';
import { withSpan } from '@/lib/tracing';

export async function processPayment(order: Order): Promise<PaymentResult> {
  return withSpan(
    'payment.process',
    async () => {
      // Stripe call is auto-instrumented via http
      // We add business-level context here
      const span = trace.getActiveSpan();
      span?.setAttributes({
        'payment.amount': order.total,
        'payment.currency': order.currency,
        'payment.method': order.paymentMethod,
        'order.id': order.id,
        'order.item_count': order.items.length,
      });

      const result = await stripe.charges.create({
        amount: Math.round(order.total * 100),
        currency: order.currency,
        source: order.paymentToken,
        metadata: { orderId: order.id },
      });

      span?.setAttributes({
        'payment.charge_id': result.id,
        'payment.status': result.status,
      });

      return result;
    },
    { 'order.id': order.id },
  );
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Custom Metrics

// src/lib/metrics.ts
import { metrics } from '@opentelemetry/api';

const meter = metrics.getMeter('api-service', '1.0.0');

// Counters — track totals
export const httpRequestCounter = meter.createCounter('http.requests.total', {
  description: 'Total HTTP requests by route and status',
});

// Histograms — track distributions
export const orderValueHistogram = meter.createHistogram('order.value', {
  description: 'Distribution of order values in USD',
  unit: 'USD',
  advice: {
    explicitBucketBoundaries: [10, 25, 50, 100, 250, 500, 1000, 5000],
  },
});

// Gauges — track current state
export const activeWebSocketsGauge = meter.createObservableGauge(
  'websocket.connections.active',
  { description: 'Currently active WebSocket connections' },
);

// Register observable gauge callback
activeWebSocketsGauge.addCallback((observableResult) => {
  observableResult.observe(wsManager.getConnectionCount(), {
    'server.instance': process.env.HOSTNAME ?? 'unknown',
  });
});

// Usage in route handler
export function recordOrderMetrics(order: Order) {
  orderValueHistogram.record(order.total, {
    'order.currency': order.currency,
    'order.region': order.region,
    'payment.method': order.paymentMethod,
  });
}

Correlating Logs with Traces

This is where observability gets powerful. When a log entry contains the trace ID, you can jump from a Loki log line directly to the Jaeger trace.

// src/lib/logger.ts
import pino from 'pino';
import { trace, context } from '@opentelemetry/api';

function getTraceContext() {
  const span = trace.getActiveSpan();
  if (!span) return {};
  
  const ctx = span.spanContext();
  return {
    traceId: ctx.traceId,
    spanId: ctx.spanId,
    // Grafana Tempo expects these field names
    'trace_id': ctx.traceId,
    'span_id': ctx.spanId,
  };
}

export const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  formatters: {
    log(obj) {
      return { ...obj, ...getTraceContext() };
    },
  },
  transport: process.env.NODE_ENV !== 'production'
    ? { target: 'pino-pretty' }
    : undefined,
});

Now every log line automatically includes traceId and spanId. In Grafana, you can configure the Loki datasource to derive fields and create links to Tempo traces.


OTel Collector Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
    check_interval: 5s

  # Add environment tag to all telemetry
  resource:
    attributes:
      - key: deployment.environment
        value: ${DEPLOYMENT_ENV}
        action: upsert

  # Sample 10% of successful traces, 100% of errors
  probabilistic_sampler:
    hash_seed: 42
    sampling_percentage: 10

  filter/errors_only:
    error_mode: ignore
    traces:
      span:
        - 'status.code == STATUS_CODE_ERROR'

exporters:
  otlp/tempo:
    endpoint: http://tempo:4317
    tls:
      insecure: true

  prometheus:
    endpoint: 0.0.0.0:8889
    namespace: otel

  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo]

    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loki]

Python: FastAPI with OTel

# instrumentation.py
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

import os

def setup_telemetry(app=None):
    resource = Resource.create({
        SERVICE_NAME: os.getenv("SERVICE_NAME", "python-service"),
        "deployment.environment": os.getenv("ENVIRONMENT", "production"),
    })

    otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317")

    # Traces
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(
        BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True))
    )
    trace.set_tracer_provider(tracer_provider)

    # Metrics
    reader = PeriodicExportingMetricReader(
        OTLPMetricExporter(endpoint=otlp_endpoint, insecure=True),
        export_interval_millis=15_000,
    )
    metrics.set_meter_provider(MeterProvider(resource=resource, metric_readers=[reader]))

    # Auto-instrument
    if app:
        FastAPIInstrumentor.instrument_app(app, excluded_urls="health,ready")
    SQLAlchemyInstrumentor().instrument()
    HTTPXClientInstrumentor().instrument()
# main.py
from fastapi import FastAPI
from instrumentation import setup_telemetry
from opentelemetry import trace

app = FastAPI()
setup_telemetry(app)

tracer = trace.get_tracer(__name__)

@app.get("/orders/{order_id}")
async def get_order(order_id: str):
    with tracer.start_as_current_span("order.fetch") as span:
        span.set_attribute("order.id", order_id)
        order = await db.fetch_order(order_id)
        span.set_attribute("order.status", order.status)
        return order

Sampling Strategy

Sampling is essential — tracing every request in high-traffic systems generates terabytes of data.

StrategyWhen to UseRate
Head-based (probabilistic)Uniform traffic, cost control1–10%
Tail-based (error-focused)Keep all errors, sample successesErrors: 100%, Success: 5%
Rate limitingBursty trafficMax N traces/sec
Parent-basedMicroservices — follow caller's decisionInherit from parent

For most production systems: tail-based sampling in the Collector — sample 5% of normal traces, 100% of error traces, and 100% of traces exceeding P95 latency.


Cost Comparison: OTel Backends

BackendTracesMetricsLogsPricing ModelEst. Monthly (10K req/min)
Grafana CloudTempoMimirLokiUsage-based$50–$200
Jaeger OSSSelf-hosted$20–$80 (infra)
Datadog APMPer host + spans$300–$800
HoneycombLimitedLimitedPer event$150–$500
AWS X-Ray + CWPer trace/event$100–$400
Self-hosted Grafana StackTempoMimirLokiInfra only$80–$200

For startups: Grafana Cloud free tier (50GB traces, 10K metrics) handles most early-stage loads. Switch to self-hosted when monthly cost exceeds $200.


Working With Viprasol

Our platform engineering team implements end-to-end observability stacks — from OTel SDK setup in your services to Grafana dashboards that surface actionable insights in minutes.

What we deliver:

  • OTel Collector deployment (Kubernetes/ECS) with sampling config
  • Auto-instrumentation for Node.js, Python, Go services
  • Custom span and metric instrumentation for business events
  • Grafana dashboards: RED metrics, SLO tracking, error rate alerts
  • Trace-to-log correlation across all services

Discuss your observability needsCloud infrastructure services


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.