OpenTelemetry is the CNCF standard for observability instrumentation — traces, metrics, and logs from a single SDK. The Node.js auto-instrumentation covers HTTP servers, Express/Fastify, pg, Redis, and most common libraries with zero code changes. You add custom spans only for the business logic that matters to your specific application.

The architecture: your app → OTLP → OpenTelemetry Collector → Grafana Tempo (traces) + Prometheus (metrics) + Loki (logs).

Installation

npm install \
  @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/exporter-metrics-otlp-http \
  @opentelemetry/sdk-metrics \
  @opentelemetry/api

Instrumentation Bootstrap

// src/instrumentation.ts
// MUST be loaded before any other imports — use --require flag or Node.js --import

import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { Resource } from "@opentelemetry/resources";
import { SemanticResourceAttributes } from "@opentelemetry/semantic-conventions";
import { BatchSpanProcessor } from "@opentelemetry/sdk-trace-node";

const resource = Resource.default().merge(
  new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: process.env.SERVICE_NAME ?? "api-service",
    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION ?? "unknown",
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]:
      process.env.NODE_ENV ?? "development",
  })
);

const traceExporter = new OTLPTraceExporter({
  url: `${process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://otel-collector:4318"}/v1/traces`,
  headers: {
    // For Grafana Cloud:
    Authorization: `Basic ${process.env.GRAFANA_OTLP_TOKEN ?? ""}`,
  },
});

const metricExporter = new OTLPMetricExporter({
  url: `${process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://otel-collector:4318"}/v1/metrics`,
});

const sdk = new NodeSDK({
  resource,
  spanProcessor: new BatchSpanProcessor(traceExporter, {
    maxExportBatchSize: 512,
    scheduledDelayMillis: 5000,
    exportTimeoutMillis: 30000,
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: metricExporter,
    exportIntervalMillis: 60_000, // Export metrics every 60s
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      // Disable noisy instrumentations
      "@opentelemetry/instrumentation-fs": { enabled: false },
      "@opentelemetry/instrumentation-dns": { enabled: false },

      // Configure HTTP instrumentation
      "@opentelemetry/instrumentation-http": {
        // Ignore health check endpoints — they create noise
        ignoreIncomingRequestHook: (req) => {
          return ["/health", "/metrics", "/favicon.ico"].some((path) =>
            req.url?.startsWith(path)
          );
        },
        // Capture request/response bodies for debugging (disable in production if sensitive)
        requestHook: (span, request) => {
          span.setAttribute("http.request.body.size",
            request.headers?.["content-length"] ?? 0
          );
        },
      },

      // PostgreSQL: capture query text (careful with PII)
      "@opentelemetry/instrumentation-pg": {
        addSqlCommenterCommentToQueries: true,
        enhancedDatabaseReporting: false, // Don't capture query values
      },
    }),
  ],
});

sdk.start();

// Graceful shutdown — flush spans before process exits
process.on("SIGTERM", () => {
  sdk.shutdown().then(() => process.exit(0));
});
process.on("SIGINT", () => {
  sdk.shutdown().then(() => process.exit(0));
});

// package.json — load instrumentation before application code
{
  "scripts": {
    "start": "node --require ./dist/instrumentation.js dist/server.js",
    "dev": "tsx --require src/instrumentation.ts src/server.ts"
  }
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

Custom Spans for Business Logic

Auto-instrumentation covers infrastructure. Custom spans cover what matters to your business:

// src/services/payment.service.ts
import { trace, context, SpanStatusCode, SpanKind } from "@opentelemetry/api";

const tracer = trace.getTracer("payment-service", "1.0.0");

export async function processPayment(params: {
  orderId: string;
  amount: number;
  currency: string;
  customerId: string;
}): Promise<{ chargeId: string }> {
  // Create a span for the entire payment operation
  return tracer.startActiveSpan(
    "payment.process",
    {
      kind: SpanKind.INTERNAL,
      attributes: {
        "payment.order_id": params.orderId,
        "payment.amount": params.amount,
        "payment.currency": params.currency,
        "payment.customer_id": params.customerId,
      },
    },
    async (span) => {
      try {
        // Sub-span for validation
        const validated = await tracer.startActiveSpan(
          "payment.validate",
          async (validateSpan) => {
            const result = await validatePaymentMethod(params.customerId);
            validateSpan.setAttribute("payment.method_type", result.type);
            validateSpan.end();
            return result;
          }
        );

        // Sub-span for Stripe API call
        const charge = await tracer.startActiveSpan(
          "payment.stripe.charge",
          { kind: SpanKind.CLIENT },
          async (stripeSpan) => {
            stripeSpan.setAttribute("stripe.idempotency_key", `order-${params.orderId}`);
            try {
              const result = await stripe.charges.create({
                amount: params.amount,
                currency: params.currency,
                customer: params.customerId,
                idempotency_key: `order-${params.orderId}`,
              });
              stripeSpan.setAttribute("stripe.charge_id", result.id);
              stripeSpan.setStatus({ code: SpanStatusCode.OK });
              stripeSpan.end();
              return result;
            } catch (err) {
              stripeSpan.recordException(err as Error);
              stripeSpan.setStatus({
                code: SpanStatusCode.ERROR,
                message: (err as Error).message,
              });
              stripeSpan.end();
              throw err;
            }
          }
        );

        span.setAttribute("payment.charge_id", charge.id);
        span.setStatus({ code: SpanStatusCode.OK });
        span.end();

        return { chargeId: charge.id };
      } catch (error) {
        span.recordException(error as Error);
        span.setStatus({
          code: SpanStatusCode.ERROR,
          message: (error as Error).message,
        });
        span.end();
        throw error;
      }
    }
  );
}

Trace Context Propagation

Distributed traces need context propagation between services:

// src/lib/http-client.ts
// Inject trace context into outgoing HTTP calls

import { context, propagation } from "@opentelemetry/api";

export async function tracedFetch(
  url: string,
  options: RequestInit = {}
): Promise<Response> {
  // Inject W3C TraceContext headers (traceparent, tracestate)
  const headers: Record<string, string> = {
    ...(options.headers as Record<string, string> ?? {}),
  };

  propagation.inject(context.active(), headers);

  return fetch(url, { ...options, headers });
}

// Usage:
const response = await tracedFetch("https://api.external.com/data", {
  headers: { "Content-Type": "application/json" },
});
// Outgoing request now has traceparent header — downstream service
// continues the same trace

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

Correlating Traces with Logs

// src/lib/logger.ts
// Add trace ID to every log line for correlation in Grafana

import { trace, context } from "@opentelemetry/api";
import pino from "pino";

const baseLogger = pino({
  level: process.env.LOG_LEVEL ?? "info",
  formatters: {
    log(obj) {
      // Inject current trace context into every log line
      const span = trace.getActiveSpan();
      if (span) {
        const spanContext = span.spanContext();
        return {
          ...obj,
          trace_id: spanContext.traceId,
          span_id: spanContext.spanId,
          trace_flags: spanContext.traceFlags,
        };
      }
      return obj;
    },
  },
});

export const logger = {
  info: (msg: string, data?: Record<string, unknown>) =>
    baseLogger.info(data ?? {}, msg),
  error: (msg: string, data?: Record<string, unknown>) =>
    baseLogger.error(data ?? {}, msg),
  warn: (msg: string, data?: Record<string, unknown>) =>
    baseLogger.warn(data ?? {}, msg),
  debug: (msg: string, data?: Record<string, unknown>) =>
    baseLogger.debug(data ?? {}, msg),
};

// Log output includes trace_id — paste into Grafana Tempo to jump to the trace
// {"level":"info","trace_id":"4bf92f3577b34da6a3ce929d0e0e4736","span_id":"00f067aa0ba902b7","msg":"Payment processed"}

OpenTelemetry Collector Configuration

# otel-collector/config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"
      grpc:
        endpoint: "0.0.0.0:4317"

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024

  # Add environment attributes to all telemetry
  resource:
    attributes:
      - key: deployment.environment
        value: ${ENVIRONMENT}
        action: upsert

  # Filter out health check spans
  filter/spans:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/health"'
        - 'attributes["http.route"] == "/metrics"'

  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
    check_interval: 5s

exporters:
  # Traces → Grafana Tempo
  otlp/tempo:
    endpoint: "tempo:4317"
    tls:
      insecure: true

  # Metrics → Prometheus
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: "otel"

  # Logs → Loki (via OTLP)
  loki:
    endpoint: "http://loki:3100/loki/api/v1/push"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, filter/spans, resource, batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [loki]

Custom Metrics

// src/metrics/business-metrics.ts
import { metrics } from "@opentelemetry/api";

const meter = metrics.getMeter("business-metrics", "1.0.0");

// Counters: monotonically increasing
export const paymentCounter = meter.createCounter("payments.total", {
  description: "Total payment attempts",
  unit: "1",
});

export const paymentRevenue = meter.createCounter("payments.revenue", {
  description: "Total payment revenue",
  unit: "usd",
});

// Histograms: record distributions (latency, sizes)
export const paymentDuration = meter.createHistogram("payments.duration", {
  description: "Payment processing duration",
  unit: "ms",
  advice: {
    explicitBucketBoundaries: [50, 100, 250, 500, 1000, 2500, 5000],
  },
});

// Observable gauges: read current value on demand
const activeSubscriptions = meter.createObservableGauge(
  "subscriptions.active",
  { description: "Current active subscriptions" }
);

activeSubscriptions.addCallback(async (result) => {
  const count = await db.query<{ count: string }>(
    "SELECT COUNT(*)::text FROM subscriptions WHERE status = 'active'"
  );
  result.observe(parseInt(count.rows[0].count));
});

// Usage in payment service:
export function recordPayment(amount: number, currency: string, outcome: "success" | "failure") {
  paymentCounter.add(1, { currency, outcome });
  if (outcome === "success") {
    paymentRevenue.add(amount, { currency });
  }
}

Working With Viprasol

OpenTelemetry instrumentation is the foundation of production observability. We instrument Node.js services with auto-instrumentation and custom business spans, set up the Collector pipeline routing to Tempo/Prometheus/Loki, and configure Grafana dashboards that let engineers jump from a slow request in the logs directly to its full distributed trace.

Observability engineering → | Talk to our engineers →

OpenTelemetry for Node.js: Auto-Instrumentation, Custom Spans, and OTLP Export

Installation

Instrumentation Bootstrap

☁️ Is Your Cloud Costing Too Much?

Custom Spans for Business Logic

Trace Context Propagation

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Correlating Traces with Logs

OpenTelemetry Collector Configuration

Custom Metrics

See Also

Working With Viprasol

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

OpenTelemetry in Production: Traces, Metrics, and Logs That Actually Help

Distributed Tracing: OpenTelemetry, Jaeger, Tempo, and Trace-Based Debugging

Observability and Monitoring: Logs, Metrics, Traces, and Alerting That Works