Application logs are often the difference between a 5-minute incident resolution and a 5-hour one. CloudWatch Logs Insights lets you query structured JSON logs with a SQL-like language, detect anomalies, build dashboards, and trigger alarms — all within the AWS ecosystem. The catch: if your logs aren't structured, Logs Insights is much less powerful.

This guide covers structured logging setup, essential query patterns, metric filters, dashboard configuration, and alarm setup with Terraform.

Structured Logging in Node.js

// lib/logger.ts — structured JSON logger using pino
import pino from "pino";

export const logger = pino({
  level: process.env.LOG_LEVEL ?? "info",
  // CloudWatch doesn't need pretty printing — raw JSON is faster to ingest and query
  formatters: {
    level: (label) => ({ level: label }),     // level: "info" not level: 30
    bindings: () => ({}),                      // Remove pid/hostname (CloudWatch adds context)
  },
  base: {
    service:     process.env.SERVICE_NAME ?? "app",
    environment: process.env.NODE_ENV,
    version:     process.env.APP_VERSION,
  },
  // Redact sensitive fields before they hit logs
  redact: {
    paths: ["password", "token", "authorization", "*.password", "*.secret"],
    censor: "[REDACTED]",
  },
  timestamp: pino.stdTimeFunctions.isoTime,
});

// Typed child logger for request context
export function createRequestLogger(requestId: string, userId?: string) {
  return logger.child({ requestId, userId });
}

// middleware/logging.ts — Next.js middleware or Express logger
import { NextRequest, NextResponse } from "next/server";
import { nanoid } from "nanoid";
import { logger } from "@/lib/logger";

export function withLogging(
  handler: (req: NextRequest, log: ReturnType<typeof logger.child>) => Promise<NextResponse>
) {
  return async (req: NextRequest): Promise<NextResponse> => {
    const requestId = req.headers.get("x-request-id") ?? nanoid();
    const log = logger.child({
      requestId,
      method: req.method,
      url:    req.nextUrl.pathname,
    });

    const start = Date.now();

    try {
      const response = await handler(req, log);

      log.info({
        event:      "request_completed",
        statusCode: response.status,
        durationMs: Date.now() - start,
      });

      return response;
    } catch (err) {
      log.error({
        event:      "request_error",
        error:      err instanceof Error ? err.message : String(err),
        stack:      err instanceof Error ? err.stack : undefined,
        durationMs: Date.now() - start,
      });
      throw err;
    }
  };
}

Example Structured Log Output

{
  "level": "info",
  "time": "2027-04-23T09:14:32.105Z",
  "service": "api",
  "environment": "production",
  "version": "1.4.2",
  "requestId": "req_01abc123",
  "userId": "usr_abc456",
  "event": "request_completed",
  "method": "POST",
  "url": "/api/invoices",
  "statusCode": 201,
  "durationMs": 142
}

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

Logs Insights Query Patterns

# 1. Error rate over time
filter level = "error"
| stats count(*) as errors by bin(5m) as period
| sort period asc

# 2. Top 10 slowest endpoints (p99 latency)
filter ispresent(durationMs)
| stats
    count(*) as requests,
    avg(durationMs) as avg_ms,
    pct(durationMs, 95) as p95_ms,
    pct(durationMs, 99) as p99_ms
  by url
| sort p99_ms desc
| limit 10

# 3. Error breakdown by type
filter level = "error"
| stats count(*) as count by error
| sort count desc
| limit 20

# 4. Request volume per user (detect heavy users or abuse)
filter ispresent(userId) and event = "request_completed"
| stats count(*) as requests by userId
| sort requests desc
| limit 25

# 5. 5xx error rate by endpoint
filter event = "request_completed"
| stats
    count(*) as total,
    count(statusCode >= 500 and statusCode < 600) as errors_5xx
  by url
| fields url, total, errors_5xx,
    round(errors_5xx / total * 100, 2) as error_rate_pct
| filter total > 10    # Only endpoints with meaningful volume
| sort error_rate_pct desc

# 6. Specific user journey: trace a requestId
filter requestId = "req_01abc123"
| fields time, level, event, statusCode, durationMs, error
| sort time asc

# 7. Database slow queries (if you log query durations)
filter event = "db_query" and queryMs > 500
| stats
    count(*) as count,
    avg(queryMs) as avg_ms,
    max(queryMs) as max_ms
  by query
| sort avg_ms desc
| limit 15

# 8. Failed logins by IP (security monitoring)
filter event = "login_failed"
| stats count(*) as attempts by ip
| filter attempts > 5
| sort attempts desc

# 9. Deployment impact: error rate before vs. after version change
filter ispresent(version)
| stats count(level = "error") as errors, count(*) as total by version
| fields version, errors, total, round(errors/total*100,2) as error_pct
| sort version asc

Metric Filters: Convert Logs to Metrics

# terraform/cloudwatch-metrics.tf

# Error count metric from structured logs
resource "aws_cloudwatch_log_metric_filter" "error_count" {
  name           = "${var.app_name}-error-count"
  log_group_name = aws_cloudwatch_log_group.app.name

  # Match JSON logs with level = "error"
  pattern = "{ $.level = \"error\" }"

  metric_transformation {
    name      = "ErrorCount"
    namespace = "App/${var.app_name}"
    value     = "1"
    unit      = "Count"
    # Include dimensions for filtering in dashboards
    dimensions = {
      Environment = "$.environment"
    }
  }
}

# p99 latency metric (requires numeric extraction)
resource "aws_cloudwatch_log_metric_filter" "request_latency" {
  name           = "${var.app_name}-request-latency"
  log_group_name = aws_cloudwatch_log_group.app.name

  pattern = "{ $.event = \"request_completed\" && $.durationMs > 0 }"

  metric_transformation {
    name      = "RequestLatency"
    namespace = "App/${var.app_name}"
    value     = "$.durationMs"
    unit      = "Milliseconds"
    dimensions = {
      Environment = "$.environment"
    }
  }
}

# 5xx error metric
resource "aws_cloudwatch_log_metric_filter" "server_errors" {
  name           = "${var.app_name}-5xx-errors"
  log_group_name = aws_cloudwatch_log_group.app.name

  pattern = "{ $.statusCode >= 500 && $.statusCode < 600 }"

  metric_transformation {
    name      = "ServerErrors"
    namespace = "App/${var.app_name}"
    value     = "1"
    unit      = "Count"
  }
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

Alarms

# terraform/cloudwatch-alarms.tf

# High error rate alarm
resource "aws_cloudwatch_metric_alarm" "high_error_rate" {
  alarm_name          = "${var.app_name}-high-error-rate"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "ErrorCount"
  namespace           = "App/${var.app_name}"
  period              = 60   # 1-minute periods
  statistic           = "Sum"
  threshold           = 10   # Alert if >10 errors per minute
  treat_missing_data  = "notBreaching"

  alarm_description   = "Error count exceeded 10 per minute for 2 consecutive minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]
  ok_actions          = [aws_sns_topic.alerts.arn]

  dimensions = { Environment = var.environment }
}

# High latency alarm (p99 > 2 seconds)
resource "aws_cloudwatch_metric_alarm" "high_latency" {
  alarm_name          = "${var.app_name}-high-p99-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  extended_statistic  = "p99"
  metric_name         = "RequestLatency"
  namespace           = "App/${var.app_name}"
  period              = 60
  threshold           = 2000   # ms
  treat_missing_data  = "notBreaching"

  alarm_description = "p99 latency > 2s for 3 consecutive minutes"
  alarm_actions     = [aws_sns_topic.alerts.arn]

  dimensions = { Environment = var.environment }
}

# Zero requests alarm (app might be down)
resource "aws_cloudwatch_metric_alarm" "no_requests" {
  alarm_name          = "${var.app_name}-no-requests"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = 3
  metric_name         = "RequestLatency"
  namespace           = "App/${var.app_name}"
  period              = 60
  statistic           = "SampleCount"
  threshold           = 1
  treat_missing_data  = "breaching"   # Missing data = app is down

  alarm_description = "No requests received for 3 minutes during business hours"
  alarm_actions     = [aws_sns_topic.alerts.arn]

  dimensions = { Environment = var.environment }
}

CloudWatch Dashboard

# terraform/cloudwatch-dashboard.tf
resource "aws_cloudwatch_dashboard" "app" {
  dashboard_name = "${var.app_name}-${var.environment}"

  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x = 0; y = 0; width = 12; height = 6
        properties = {
          title  = "Error Rate"
          period = 60
          metrics = [[
            "App/${var.app_name}", "ErrorCount",
            "Environment", var.environment,
            { stat = "Sum", color = "#d62728" }
          ]]
          view = "timeSeries"
          yAxis = { left = { min = 0 } }
          annotations = {
            horizontal = [{ value = 10, label = "Alert threshold", color = "#ff7f0e" }]
          }
        }
      },
      {
        type   = "metric"
        x = 12; y = 0; width = 12; height = 6
        properties = {
          title  = "Request Latency"
          period = 60
          metrics = [
            ["App/${var.app_name}", "RequestLatency", "Environment", var.environment, { stat = "p50", label = "p50" }],
            ["App/${var.app_name}", "RequestLatency", "Environment", var.environment, { stat = "p95", label = "p95" }],
            ["App/${var.app_name}", "RequestLatency", "Environment", var.environment, { stat = "p99", label = "p99", color = "#d62728" }],
          ]
          view = "timeSeries"
        }
      },
      {
        type   = "log"
        x = 0; y = 6; width = 24; height = 8
        properties = {
          title   = "Recent Errors"
          query   = "SOURCE '${aws_cloudwatch_log_group.app.name}' | filter level = \"error\" | fields time, event, error, url, userId | sort time desc | limit 50"
          region  = var.aws_region
          view    = "table"
        }
      }
    ]
  })
}

Log Retention and Cost

resource "aws_cloudwatch_log_group" "app" {
  name              = "/app/${var.app_name}/${var.environment}"
  retention_in_days = var.environment == "production" ? 90 : 14

  tags = { Environment = var.environment }
}

CloudWatch pricing (2026):

Component	Price
Log ingestion	$0.50/GB
Log storage	$0.03/GB/month
Logs Insights queries	$0.005/GB scanned
Metric filters	Free (up to 10 filters per log group)
Custom metrics	$0.30/metric/month
Alarms	$0.10/alarm/month

For a medium SaaS app: 10GB/day ingestion = $5/day + storage. At $0.03/GB-month for 90-day retention: ~$27/month storage.

Working With Viprasol

Observability is what separates teams that fix incidents in 5 minutes from teams that spend 5 hours correlating log files from three different places. Our team sets up structured JSON logging, Logs Insights query libraries, metric filters for error rate and latency, and CloudWatch dashboards that give you situational awareness at a glance.

What we deliver:

pino structured logger with redaction for sensitive fields
Request middleware with requestId, durationMs, statusCode
Logs Insights saved queries for errors, latency p99, 5xx rate, user journeys
Terraform: metric filters, CloudWatch alarms, and dashboard
Log group retention policy matched to compliance requirements

Talk to our team about your observability stack →

Or explore our cloud infrastructure services.

AWS CloudWatch Logs Insights: Query Patterns, Dashboards, Alarms, and Structured Logging

Structured Logging in Node.js

Example Structured Log Output

☁️ Is Your Cloud Costing Too Much?

Logs Insights Query Patterns

Metric Filters: Convert Logs to Metrics

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Alarms

CloudWatch Dashboard

Log Retention and Cost

See Also

Working With Viprasol

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

AWS CloudWatch Observability in 2026: Custom Metrics, Log Insights, and Anomaly Detection

AWS ECS Autoscaling: Target Tracking, Step Scaling, and Fargate Capacity Providers with Terraform

AWS SQS Message Processing: Consumer Workers, Visibility Timeout, DLQ, and Idempotency