Back to Blog

OpenTelemetry in Production: Traces, Metrics, and Logs That Actually Help

Set up OpenTelemetry in Node.js and Python services. Auto-instrumentation, custom spans, OTLP export to Jaeger/Grafana Tempo, and correlating traces with logs a

Viprasol Tech Team
July 7, 2026
14 min read

OpenTelemetry: Observability for Modern Applications (2026)

Observability is the difference between firefighting in the dark and methodically solving problems. At Viprasol, we've moved from the fragmented world of multiple monitoring tools to OpenTelemetry—a unified approach to collecting, processing, and exporting telemetry data. This shift has transformed how we understand what's happening inside our applications.

The Observability Crisis We Solved

Five years ago, our monitoring setup looked like this: Application Insights for some services, Datadog for others, custom logging in a few places, and manual traces scattered throughout the codebase. Each tool worked fine in isolation, but getting a complete picture of a user request flowing through our system was nearly impossible.

A user reported slow performance. We checked metrics. No spike. We checked logs. Found an error, but couldn't correlate it with anything else. We grabbed a sample trace from one service, but the next service in the chain logged differently. Three hours later, we finally found the culprit: a database connection pool was exhausted in an obscure service.

This experience pushed us to find a better way. We discovered OpenTelemetry—an open standard for observability that was gaining momentum. Instead of replacing one vendor lock-in with another, we could instrument our code once and send data to any backend we chose. That flexibility changed everything.

Understanding the Three Pillars of OpenTelemetry

OpenTelemetry unifies three types of telemetry data:

Traces

A trace represents the entire journey of a single request through your system. It shows:

  • Which services processed the request
  • How long each operation took
  • Where errors occurred
  • Dependencies between operations

When a user makes a request to your application, a trace captures every step: frontend JavaScript execution, API call, database query, cache lookup, external API call. All connected in a single timeline.

Metrics

Metrics answer the question: "What's happening in aggregate?" They measure:

  • Request rates and latencies
  • Error percentages
  • CPU and memory usage
  • Queue depths and throughput
  • Business metrics (signups, purchases, etc.)

Unlike traces which are request-specific, metrics are rolled-up statistics. They tell you that your 99th percentile latency is 2 seconds, not that user Alice's request took 2 seconds.

Logs

Logs remain important, but in OpenTelemetry they're contextualized. Instead of a log message floating in isolation, it includes trace IDs and span IDs, connecting it to the broader picture.

Code:

2026-03-07T10:15:23Z ERROR [trace_id=abc123] Payment processing failed
// vs
2026-03-07T10:15:23Z ERROR Payment processing failed (the old way)

The first log message can be found instantly by anyone looking at the payment processing trace. The second requires guesswork and hope.

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Setting Up OpenTelemetry in Node.js Applications

For web development projects, here's how we bootstrap OpenTelemetry:

Code:

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-node';
import { MeterProvider, PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';

const traceExporter = new OTLPTraceExporter({
  url: 'http://otel-collector:4318/v1/traces'
});

const metricExporter = new OTLPMetricExporter({
  url: 'http://otel-collector:4318/v1/metrics'
});

const sdk = new NodeSDK({
  traceExporter,
  instrumentations: [getNodeAutoInstrumentations()],
  metricReader: new PeriodicExportingMetricReader({
    exporter: metricExporter
  })
});

sdk.start();
console.log('OpenTelemetry started');

This single initialization automatically instruments:

  • HTTP requests
  • Database calls
  • External API calls
  • Async operations
  • Custom code

Auto-instrumentation is powerful, but custom instrumentation is where you gain real insight:

Code:

import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('my-app');

async function processPayment(userId, amount) {
  const span = tracer.startSpan('payment.process');
  
  try {
    span.setAttributes({
      'user.id': userId,
      'payment.amount': amount,
      'payment.currency': 'USD'
    });

    const result = await chargeCard(userId, amount);
    span.setStatus({ code: SpanStatusCode.OK });
    return result;

  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: SpanStatusCode.ERROR });
    throw error;

  } finally {
    span.end();
  }
}

Browser and Frontend Instrumentation

OpenTelemetry isn't just for backend. Modern SaaS development requires frontend observability too:

Code:

import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { ZoneContextManager } from '@opentelemetry/context-zone';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const provider = new WebTracerProvider({
  resource: new Resource({
    'service.name': 'frontend-app'
  })
});

provider.addSpanProcessor(
  new BatchSpanProcessor(new OTLPTraceExporter())
);
provider.register({
  contextManager: new ZoneContextManager()
});

const tracer = trace.getTracer('app');

// Track user interactions
document.addEventListener('click', (event) => {
  const span = tracer.startSpan('user.interaction.click');
  span.setAttributes({
    'element.id': event.target.id,
    'element.class': event.target.className
  });
  span.end();
});
opentelemetry - OpenTelemetry in Production: Traces, Metrics, and Logs That Actually Help

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Deployment Architecture

For cloud solutions, OpenTelemetry follows this pattern:

Code:

┌─────────────────────────────────────┐
│  Your Applications                   │
│  (Node.js, Python, Go, Java, etc.)  │
└────────────────┬────────────────────┘
                 │ OTLP Protocol (HTTP/gRPC)
                 ▼
┌─────────────────────────────────────┐
│  OpenTelemetry Collector            │
│  - Receives telemetry               │
│  - Batches for efficiency           │
│  - Routes to multiple backends      │
└────────┬───────────────┬────────────┘
         │               │
         ▼               ▼
    Jaeger (Traces)  Prometheus (Metrics)

Each application sends telemetry to a central collector, which acts as a router. This provides:

  1. Decoupling: Change backends without redeploying applications
  2. Batching: More efficient network usage
  3. Filtering: Reduce storage costs by dropping unneeded data
  4. Transformation: Enrich telemetry with additional context

Practical Instrumentation Patterns

Database Observability

Most frameworks auto-instrument databases, but custom context helps:

Code:

async function queryDatabase(query, params) {
  const span = tracer.startSpan('db.query', {
    attributes: {
      'db.system': 'postgres',
      'db.statement': query.substring(0, 100), // Truncate for safety
      'db.params': params.length
    }
  });

  try {
    const startTime = Date.now();
    const result = await pool.query(query, params);
    span.setAttributes({
      'db.rows_affected': result.rowCount,
      'db.duration_ms': Date.now() - startTime
    });
    return result;
  } catch (error) {
    span.recordException(error);
    throw error;
  } finally {
    span.end();
  }
}

External API Calls

Track third-party integrations:

Code:

async function callExternalAPI(service, endpoint) {
  const span = tracer.startSpan('http.client', {
    attributes: {
      'http.method': 'GET',
      'http.url': **${service}${endpoint}**,
      'http.target': endpoint
    }
  });

  try {
    const response = await fetch(**${service}${endpoint}**);
    span.setAttributes({
      'http.status_code': response.status,
      'http.response_time_ms': Date.now() - startTime
    });
    return response;
  } catch (error) {
    span.recordException(error);
    throw error;
  } finally {
    span.end();
  }
}

Business Logic Instrumentation

This is where OpenTelemetry really shines:

Code:

async function checkoutCart(userId, items) {
  const span = tracer.startSpan('checkout.process');
  
  span.setAttributes({
    'user.id': userId,
    'cart.item_count': items.length,
    'cart.total': items.reduce((sum, i) => sum + i.price, 0)
  });

  // Capture business events
  const validationSpan = tracer.startSpan('checkout.validation', {
    parent: span
  });
  validateItems(items);
  validationSpan.end();

  const paymentSpan = tracer.startSpan('checkout.payment', {
    parent: span
  });
  const paymentResult = await processPayment(userId, items);
  paymentSpan.setAttributes({
    'payment.status': paymentResult.status,
    'payment.method': paymentResult.method
  });
  paymentSpan.end();

  span.end();
  return paymentResult;
}

Sampling Strategies for Cost Control

Collecting telemetry for every request gets expensive at scale. Sampling reduces costs while maintaining insight:

Code:

import { ProbabilitySampler } from '@opentelemetry/sdk-trace-node';

// Sample 10% of requests
const sdk = new NodeSDK({
  sampler: new ProbabilitySampler(0.1)
});

Better: adaptive sampling that samples more when error rates are high:

Code:

class AdaptiveSampler implements Sampler {
  shouldSample(context, traceId, spanName, spanKind, attributes) {
    // Sample all errors
    if (attributes['error'] === true) {
      return { decision: SamplingDecision.RECORD_AND_SAMPLE };
    }

    // Sample 5% of normal requests
    if (Math.random() < 0.05) {
      return { decision: SamplingDecision.RECORD_AND_SAMPLE };
    }

    // Don't sample health checks
    return { decision: SamplingDecision.NOT_RECORD };
  }
}

Key Features Comparison

FeatureJaegerTempoDatadog
Open SourceYesYesNo
Trace StorageLocal/ESS3/GCSProprietary
CostLowLowHigh
Ease of SetupMediumEasyVery Easy
Query FlexibilityGoodLimitedExcellent

For detailed implementation guidance, consult the official OpenTelemetry documentation and explore Jaeger's architecture guide to understand how distributed tracing works at scale. Also review Google Cloud's observability documentation for additional best practices.

Common Pitfalls and Solutions

Too Much Data, Too Little Insight

Don't instrument everything. Focus on:

  • User-facing operations
  • External integrations
  • Error paths
  • Business-critical workflows

Cardinality Explosion

Avoid creating spans with unbounded attributes:

Code:

// Bad: Creates thousands of unique span names
for (let i = 0; i < items.length; i++) {
  tracer.startSpan(**item.process.${items[i].id}**);
}

// Good: Single span with list attribute
const span = tracer.startSpan('items.process');
span.setAttributes({
  'items.count': items.length
});

Performance Impact

OpenTelemetry instrumentation has overhead. Minimize it:

Code:

// Batch exports instead of sending individually
const processor = new BatchSpanProcessor(exporter, {
  maxQueueSize: 2048,
  maxExportBatchSize: 512,
  scheduledDelayMillis: 5000
});

Advanced Instrumentation Strategies

Request Context Propagation

Trace requests across services using W3C Trace Context:

Code:

import { W3CTraceContextPropagator } from '@opentelemetry/core';

const propagator = new W3CTraceContextPropagator();

// Extract trace context from incoming request
const extractedContext = propagator.extract(
  context.active(),
  request.headers,
  defaultTextMapGetter
);

// Set as active context for downstream operations
context.with(extractedContext, async () => {
  // All operations here use the same trace
  await processRequest(request);
});

Custom Resource Attributes

Add metadata to identify your services:

Code:

import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';

const resource = Resource.default().merge(
  new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'payment-service',
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.2.3',
    'deployment.environment': process.env.NODE_ENV,
    'git.commit': process.env.GIT_SHA,
    'kubernetes.namespace': process.env.K8S_NAMESPACE
  })
);

Filtering and Processing Telemetry

Reduce storage costs by filtering unneeded data:

Code:

class FilteringSpanProcessor {
  onStart(span, context) {
    // Don't trace health checks
    if (span.name.includes('health')) {
      span.addEvent('filtered');
      return;
    }
  }

  onEnd(span) {
    // Drop very fast spans in production
    if (span.duration < 1 && process.env.NODE_ENV === 'production') {
      return;
    }
  }
}

Correlation with Business Events

Connect telemetry to business metrics:

Code:

// In payment processing
async function processPayment(userId: string, amount: number) {
  const span = tracer.startSpan('payment.process');
  
  span.setAttributes({
    'user.id': userId,
    'payment.amount': amount,
    'user.tier': await getUserTier(userId),
    'payment.method': 'credit_card'
  });

  // Track business event
  metrics.recordPayment(amount);

  try {
    const result = await chargeCard(userId, amount);
    span.addEvent('payment.success', {
      'transaction.id': result.transactionId
    });
    return result;
  } catch (error) {
    span.recordException(error);
    metrics.recordPaymentFailure(amount);
    throw error;
  } finally {
    span.end();
  }
}

Deployment and Operations

Docker Container Setup

OpenTelemetry in containers:

Code:

FROM node:18-alpine

WORKDIR /app
COPY package.json .
RUN npm install

COPY . .

ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
ENV OTEL_EXPORTER_OTLP_INSECURE=true
ENV OTEL_TRACES_EXPORTER=otlp
ENV OTEL_METRICS_EXPORTER=otlp
ENV OTEL_LOGS_EXPORTER=otlp

CMD ["node", "app.js"]

Kubernetes Integration

Use sidecar pattern for the collector:

Code:

apiVersion: v1
kind: Pod
metadata:
  name: app-with-collector
spec:
  containers:
  - name: app
    image: myapp:latest
    env:
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://localhost:4318
  - name: otel-collector
    image: otel/opentelemetry-collector:latest
    ports:
    - containerPort: 4318

FAQ

Q: Do I need to use OpenTelemetry? A: If you run multiple services, yes. It's the industry standard. For single monoliths, it's still valuable for understanding performance.

Q: Can I migrate from another tool? A: Yes. OpenTelemetry works alongside existing tools. Gradually migrate by setting up both.

Q: What's the performance overhead? A: Typically 5-15% CPU impact when batched. Auto-instrumentation is more expensive than manual.

Q: How much data should I collect? A: Start with 100% sampling in development, 5-10% in production, 100% for errors.

Q: Can I query OpenTelemetry data? A: Yes, through your backend. Jaeger, Tempo, and others have query UIs.

Q: What about privacy and data retention? A: OpenTelemetry doesn't store data—backends do. Implement retention policies (30-90 days typical).

Q: How do I handle cardinality explosion? A: Avoid using unbounded values (user IDs, order IDs) as attribute keys. Use them as values instead, and limit unique values.

Q: What's the learning curve? A: Basic instrumentation is straightforward. Advanced patterns (sampling, filtering, context propagation) take more time to master.

Moving Forward with OpenTelemetry

Observability is not optional anymore. As systems grow more complex, the ability to see what's happening becomes mission-critical. OpenTelemetry provides the foundation that lets us instrument once and adapt our observability infrastructure as our needs evolve.

Start with auto-instrumentation. It gives you 80% of the value. Then add custom spans for business logic. Ship telemetry to a backend you choose. Move from reactive firefighting to proactive understanding.

The teams we work with—across web development, SaaS, and cloud infrastructure—consistently tell us that OpenTelemetry transformed how they debug production issues. What used to take hours now takes minutes. And more importantly, they catch problems before users notice them.

opentelemetryobservabilitytracingmonitoringdevops
Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 1000+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.