Real-Time Analytics in 2026: ClickHouse, Kafka, and Streaming Aggregations

Most SaaS products start analytics the wrong way: SELECT COUNT(*) FROM orders WHERE created_at > now() - interval '30 days' running against their production PostgreSQL. It works until it doesn't — usually the week you land a big customer and your dashboard queries start timing out.

Real-time analytics requires a different storage engine. Transactional databases (PostgreSQL, MySQL) are optimized for row-level operations. Analytical databases (ClickHouse, BigQuery, Redshift) are optimized for scanning millions of rows fast. The trick is getting data from one to the other without a 24-hour delay.

This post covers the full stack: ClickHouse for OLAP storage, Kafka for event streaming, and the patterns for building dashboards that update in seconds rather than hours.

The Analytics Stack Options

Approach	Latency	Scale	Cost	Complexity
Direct PostgreSQL queries	Seconds–minutes	<10M rows	Very low	Very low
Materialized views in PG	Minutes (refresh lag)	<50M rows	Low	Low
PostgreSQL + TimescaleDB	Seconds	<500M rows	Low	Medium
ClickHouse + Kafka	Seconds	Billions of rows	Medium	Medium–High
BigQuery + Pub/Sub	Seconds–minutes	Unlimited	High (egress)	Medium
Snowflake + Kafka	Minutes	Unlimited	High	Medium
Apache Flink + Kafka	Milliseconds	Unlimited	High	Very High

Decision framework:

<50M events/month, mostly aggregates: Materialized views in PostgreSQL
50M–5B events/month, need sub-second dashboards: ClickHouse
>5B events/month, complex stream processing: Flink + ClickHouse
Managed, pay-per-query OK: BigQuery or Snowflake

ClickHouse: Why It's Fast

ClickHouse stores data in column-oriented format. When you query SELECT sum(revenue) FROM events WHERE date > today() - 30, ClickHouse reads only the revenue and date columns — skipping everything else. Combined with vectorized execution and aggressive compression (10–50x on analytical data), query times drop from minutes to milliseconds.

Setting Up ClickHouse

# docker-compose.yml
services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.3
    ports:
      - "8123:8123"  # HTTP interface
      - "9000:9000"  # Native protocol
    volumes:
      - clickhouse_data:/var/lib/clickhouse
      - ./clickhouse/config.xml:/etc/clickhouse-server/config.xml
    environment:
      CLICKHOUSE_USER: analytics
      CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD}
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

volumes:
  clickhouse_data:

Event Schema Design

-- Events table: append-only, immutable
CREATE TABLE events
(
    event_id     UUID DEFAULT generateUUIDv4(),
    tenant_id    LowCardinality(String),
    user_id      String,
    session_id   String,
    event_type   LowCardinality(String),  -- 'page_view', 'click', 'purchase'
    properties   String,                  -- JSON blob
    revenue      Decimal64(2),
    timestamp    DateTime64(3, 'UTC'),    -- millisecond precision
    date         Date MATERIALIZED toDate(timestamp),  -- auto-derived

    -- Pre-extracted common properties (avoid JSON parsing at query time)
    page_path    String,
    referrer     String,
    user_agent   String,
    country_code LowCardinality(String),
    device_type  LowCardinality(String)   -- 'desktop', 'mobile', 'tablet'
)
ENGINE = MergeTree()
PARTITION BY (tenant_id, toYYYYMM(date))
ORDER BY (tenant_id, date, event_type, user_id, timestamp)
TTL date + INTERVAL 2 YEAR DELETE
SETTINGS
    index_granularity = 8192,
    min_bytes_for_wide_part = 10485760;

-- Materialized view: pre-aggregate daily stats per tenant
CREATE MATERIALIZED VIEW daily_stats
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (tenant_id, date, event_type, country_code)
AS SELECT
    tenant_id,
    date,
    event_type,
    country_code,
    count()           AS event_count,
    countDistinct(user_id)  AS unique_users,
    sum(revenue)      AS total_revenue
FROM events
GROUP BY tenant_id, date, event_type, country_code;

Key design decisions:

LowCardinality(String) for columns with <10K distinct values — 3–4x compression
Partition by (tenant_id, month) for multi-tenant data isolation and efficient pruning
ORDER BY matching common query patterns — data is sorted on disk, enabling O(log n) lookups
Materialized view for common aggregations — query hits the view, not the raw events table

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

AWS, GCP, Azure certified engineers
Infrastructure as Code (Terraform, CDK)
Docker, Kubernetes, GitHub Actions CI/CD
Typical audit recovers $500–$3,000/month in savings

Get a Free Cloud Audit WhatsApp

Kafka: Streaming Events to ClickHouse

Kafka Topic Setup

// src/kafka/producer.ts
import { Kafka, Partitioners, CompressionTypes } from 'kafkajs';
import { AnalyticsEvent } from '@/types/analytics';

const kafka = new Kafka({
  clientId: 'api-service',
  brokers: process.env.KAFKA_BROKERS?.split(',') ?? ['kafka:9092'],
  ssl: process.env.NODE_ENV === 'production',
  sasl: process.env.KAFKA_USERNAME ? {
    mechanism: 'scram-sha-256',
    username: process.env.KAFKA_USERNAME,
    password: process.env.KAFKA_PASSWORD!,
  } : undefined,
});

const producer = kafka.producer({
  createPartitioner: Partitioners.DefaultPartitioner,
  compression: CompressionTypes.Snappy, // Good balance of speed and ratio
  // Batch for throughput, but max 100ms delay
  batch: {
    size: 1024 * 1024, // 1MB
    lingerMs: 50,
  },
});

await producer.connect();

export async function trackEvent(event: AnalyticsEvent): Promise<void> {
  await producer.send({
    topic: 'analytics-events',
    messages: [
      {
        key: event.tenantId, // Same tenant → same partition → ordered
        value: JSON.stringify({
          ...event,
          timestamp: new Date().toISOString(),
          schema_version: 'v2',
        }),
        headers: {
          'event-type': event.eventType,
          'schema-version': 'v2',
        },
      },
    ],
  });
}

// Non-blocking fire-and-forget for high-throughput paths
export function trackEventAsync(event: AnalyticsEvent): void {
  trackEvent(event).catch((err) => {
    logger.error({ err, event }, 'Failed to track event');
    // Fallback: write to local buffer for later replay
    eventBuffer.push(event);
  });
}

ClickHouse Kafka Engine (Native Integration)

ClickHouse has a native Kafka engine — no separate consumer process needed:

-- 1. Kafka engine table (reads from topic)
CREATE TABLE events_kafka
(
    event_id   String,
    tenant_id  String,
    user_id    String,
    session_id String,
    event_type String,
    properties String,
    revenue    Float64,
    timestamp  String,
    page_path  String,
    referrer   String,
    country_code String,
    device_type String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'analytics-events',
    kafka_group_name = 'clickhouse-consumer',
    kafka_format = 'JSONEachRow',
    kafka_num_consumers = 4,
    kafka_skip_broken_messages = 100;

-- 2. Materialized view: moves data from Kafka table → events table
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT
    toUUID(event_id)              AS event_id,
    tenant_id,
    user_id,
    session_id,
    event_type,
    properties,
    toDecimal64(revenue, 2)       AS revenue,
    parseDateTime64BestEffort(timestamp, 3) AS timestamp,
    page_path,
    referrer,
    country_code,
    device_type
FROM events_kafka
WHERE length(tenant_id) > 0;  -- Filter malformed rows

This is powerful: the Kafka engine handles consumer group management, offset commits, and batching. The MV transforms and inserts. Zero application code for the consumer.

Node.js Analytics API

// src/analytics/queries.ts
import { createClient, ClickHouseClient } from '@clickhouse/client';

const ch: ClickHouseClient = createClient({
  host: process.env.CLICKHOUSE_HOST ?? 'http://clickhouse:8123',
  username: 'analytics',
  password: process.env.CLICKHOUSE_PASSWORD,
  database: 'analytics',
  clickhouse_settings: {
    async_insert: 1,
    wait_for_async_insert: 0,
    max_execution_time: 30,
  },
});

interface DashboardMetrics {
  totalRevenue: number;
  uniqueUsers: number;
  eventBreakdown: { eventType: string; count: number; revenue: number }[];
  dailyTrend: { date: string; revenue: number; users: number }[];
}

export async function getDashboardMetrics(
  tenantId: string,
  days: number = 30,
): Promise<DashboardMetrics> {
  const query = `
    SELECT
      sum(total_revenue)        AS total_revenue,
      sum(unique_users)         AS unique_users,
      groupArray((event_type, toUInt64(event_count), total_revenue)) AS breakdown
    FROM daily_stats
    WHERE
      tenant_id = {tenantId: String}
      AND date >= today() - {days: UInt32}
  `;

  const result = await ch.query({
    query,
    query_params: { tenantId, days },
    format: 'JSONEachRow',
  });

  const rows = await result.json<{
    total_revenue: string;
    unique_users: string;
    breakdown: Array<[string, string, string]>;
  }>();

  const row = rows[0];

  // Daily trend query
  const trendResult = await ch.query({
    query: `
      SELECT
        date,
        sum(total_revenue) AS revenue,
        sum(unique_users)  AS users
      FROM daily_stats
      WHERE tenant_id = {tenantId: String}
        AND date >= today() - {days: UInt32}
      GROUP BY date
      ORDER BY date ASC
    `,
    query_params: { tenantId, days },
    format: 'JSONEachRow',
  });

  const trend = await trendResult.json<{ date: string; revenue: string; users: string }>();

  return {
    totalRevenue: parseFloat(row.total_revenue),
    uniqueUsers: parseInt(row.unique_users),
    eventBreakdown: row.breakdown.map(([eventType, count, revenue]) => ({
      eventType,
      count: parseInt(count),
      revenue: parseFloat(revenue),
    })),
    dailyTrend: trend.map((t) => ({
      date: t.date,
      revenue: parseFloat(t.revenue),
      users: parseInt(t.users),
    })),
  };
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

Staging + production environments with feature flags
Automated security scanning in the pipeline
Uptime monitoring + alerting + runbook automation
On-call support handover docs included

Modernize My DevOps WhatsApp

Funnel Analysis

Funnel analysis (conversion rates through a defined sequence of events) is ClickHouse's specialty:

-- Calculate signup → onboarding → purchase funnel
SELECT
    tenant_id,
    countDistinct(user_id)                                          AS step1_signup,
    countDistinctIf(user_id, event_type = 'onboarding_complete')    AS step2_onboarding,
    countDistinctIf(user_id, event_type = 'first_purchase')         AS step3_purchase,

    -- Conversion rates
    step2_onboarding / step1_signup * 100                           AS signup_to_onboarding_pct,
    step3_purchase   / step2_onboarding * 100                       AS onboarding_to_purchase_pct

FROM events
WHERE
    tenant_id = 'tenant-abc'
    AND date >= today() - 30
    AND event_type IN ('signup', 'onboarding_complete', 'first_purchase')
GROUP BY tenant_id;

-- Ordered funnel with time constraints (must complete steps within 7 days)
SELECT
    level,
    count() AS users
FROM (
    SELECT
        user_id,
        windowFunnel(7 * 24 * 3600)(  -- 7 day window in seconds
            toUnixTimestamp(timestamp),
            event_type = 'signup',
            event_type = 'onboarding_complete',
            event_type = 'first_purchase'
        ) AS level
    FROM events
    WHERE tenant_id = 'tenant-abc' AND date >= today() - 30
    GROUP BY user_id
)
GROUP BY level
ORDER BY level;

Retention Cohort Analysis

-- Weekly retention cohorts
SELECT
    cohort_week,
    week_number,
    users_in_cohort,
    retained_users,
    retained_users / users_in_cohort * 100 AS retention_pct
FROM (
    SELECT
        toStartOfWeek(first_seen)    AS cohort_week,
        dateDiff('week', first_seen, activity_week) AS week_number,
        count(DISTINCT cohort.user_id)   AS users_in_cohort,
        count(DISTINCT activity.user_id) AS retained_users
    FROM (
        -- First activity per user
        SELECT user_id, min(date) AS first_seen
        FROM events
        WHERE tenant_id = 'tenant-abc' AND event_type = 'signup'
        GROUP BY user_id
    ) cohort
    JOIN (
        SELECT user_id, toStartOfWeek(date) AS activity_week
        FROM events
        WHERE tenant_id = 'tenant-abc'
        GROUP BY user_id, activity_week
    ) activity ON cohort.user_id = activity.user_id
    WHERE week_number BETWEEN 0 AND 8
    GROUP BY cohort_week, week_number
)
ORDER BY cohort_week, week_number;

Infrastructure Cost Estimates

Scale (events/month)	ClickHouse Setup	Kafka Setup	Monthly Cost
<100M	1× `c6i.xlarge` (4 vCPU, 8GB)	MSK `kafka.t3.small`	$150–$300
100M–1B	3-node cluster `c6i.2xlarge`	MSK `kafka.m5.large` (3 broker)	$600–$1,200
1B–10B	6-node cluster `c6i.4xlarge`	MSK `kafka.m5.xlarge` (6 broker)	$2,500–$5,000
>10B	ClickHouse Cloud / Altinity	Confluent Cloud	$8,000–$25,000

Cost optimization:

Use S3-backed ClickHouse tiered storage for cold data (90% cheaper)
Enable Kafka log compaction on CDC topics
Compress with Snappy/LZ4 (50–70% size reduction)
TTL policies to auto-expire old data

Working With Viprasol

Our data engineering team builds production analytics pipelines for SaaS products — from initial ClickHouse setup to real-time dashboards that handle billion-event scale.

What we deliver:

ClickHouse cluster design (schema, partitioning, materialized views)
Kafka topic architecture and producer/consumer patterns
Multi-tenant analytics APIs with row-level isolation
Grafana or custom React dashboard implementations
Migration from PostgreSQL to ClickHouse with zero downtime

→ Talk about your analytics requirements → AI and data engineering services

Real-Time Analytics in 2026: ClickHouse, Kafka, and Streaming Aggregations

Real-Time Analytics in 2026: ClickHouse, Kafka, and Streaming Aggregations

The Analytics Stack Options

ClickHouse: Why It's Fast

Setting Up ClickHouse

Event Schema Design

☁️ Is Your Cloud Costing Too Much?

Kafka: Streaming Events to ClickHouse

Kafka Topic Setup

ClickHouse Kafka Engine (Native Integration)

Node.js Analytics API

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Funnel Analysis

Retention Cohort Analysis

Infrastructure Cost Estimates

Working With Viprasol

See Also

Viprasol Tech Team

Need DevOps & Cloud Expertise?

Making sense of your data at scale?

Related Articles

PostgreSQL Logical Decoding: Change Data Capture, Debezium, and Real-Time Data Pipelines

PostgreSQL Time-Series Data: date_trunc Bucketing, Gap-Fill, and Timescale Comparison

Case Study: Building a Real-Time Data Pipeline Processing 1M Events/Day