Back to Blog

Real-Time Analytics in 2026: ClickHouse, Kafka, and Streaming Aggregations

Build real-time analytics with ClickHouse and Apache Kafka. Streaming aggregations, materialized views, Kafka consumer patterns, and cost-effective OLAP for Saa

Viprasol Tech Team
July 8, 2026
14 min read

Real-Time Analytics in 2026: ClickHouse, Kafka, and Streaming Aggregations

Most SaaS products start analytics the wrong way: SELECT COUNT(*) FROM orders WHERE created_at > now() - interval '30 days' running against their production PostgreSQL. It works until it doesn't — usually the week you land a big customer and your dashboard queries start timing out.

Real-time analytics requires a different storage engine. Transactional databases (PostgreSQL, MySQL) are optimized for row-level operations. Analytical databases (ClickHouse, BigQuery, Redshift) are optimized for scanning millions of rows fast. The trick is getting data from one to the other without a 24-hour delay.

This post covers the full stack: ClickHouse for OLAP storage, Kafka for event streaming, and the patterns for building dashboards that update in seconds rather than hours.


The Analytics Stack Options

ApproachLatencyScaleCostComplexity
Direct PostgreSQL queriesSeconds–minutes<10M rowsVery lowVery low
Materialized views in PGMinutes (refresh lag)<50M rowsLowLow
PostgreSQL + TimescaleDBSeconds<500M rowsLowMedium
ClickHouse + KafkaSecondsBillions of rowsMediumMedium–High
BigQuery + Pub/SubSeconds–minutesUnlimitedHigh (egress)Medium
Snowflake + KafkaMinutesUnlimitedHighMedium
Apache Flink + KafkaMillisecondsUnlimitedHighVery High

Decision framework:

  • <50M events/month, mostly aggregates: Materialized views in PostgreSQL
  • 50M–5B events/month, need sub-second dashboards: ClickHouse
  • >5B events/month, complex stream processing: Flink + ClickHouse
  • Managed, pay-per-query OK: BigQuery or Snowflake

ClickHouse: Why It's Fast

ClickHouse stores data in column-oriented format. When you query SELECT sum(revenue) FROM events WHERE date > today() - 30, ClickHouse reads only the revenue and date columns — skipping everything else. Combined with vectorized execution and aggressive compression (10–50x on analytical data), query times drop from minutes to milliseconds.

Setting Up ClickHouse

# docker-compose.yml
services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.3
    ports:
      - "8123:8123"  # HTTP interface
      - "9000:9000"  # Native protocol
    volumes:
      - clickhouse_data:/var/lib/clickhouse
      - ./clickhouse/config.xml:/etc/clickhouse-server/config.xml
    environment:
      CLICKHOUSE_USER: analytics
      CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD}
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

volumes:
  clickhouse_data:

Event Schema Design

-- Events table: append-only, immutable
CREATE TABLE events
(
    event_id     UUID DEFAULT generateUUIDv4(),
    tenant_id    LowCardinality(String),
    user_id      String,
    session_id   String,
    event_type   LowCardinality(String),  -- 'page_view', 'click', 'purchase'
    properties   String,                  -- JSON blob
    revenue      Decimal64(2),
    timestamp    DateTime64(3, 'UTC'),    -- millisecond precision
    date         Date MATERIALIZED toDate(timestamp),  -- auto-derived

    -- Pre-extracted common properties (avoid JSON parsing at query time)
    page_path    String,
    referrer     String,
    user_agent   String,
    country_code LowCardinality(String),
    device_type  LowCardinality(String)   -- 'desktop', 'mobile', 'tablet'
)
ENGINE = MergeTree()
PARTITION BY (tenant_id, toYYYYMM(date))
ORDER BY (tenant_id, date, event_type, user_id, timestamp)
TTL date + INTERVAL 2 YEAR DELETE
SETTINGS
    index_granularity = 8192,
    min_bytes_for_wide_part = 10485760;

-- Materialized view: pre-aggregate daily stats per tenant
CREATE MATERIALIZED VIEW daily_stats
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (tenant_id, date, event_type, country_code)
AS SELECT
    tenant_id,
    date,
    event_type,
    country_code,
    count()           AS event_count,
    countDistinct(user_id)  AS unique_users,
    sum(revenue)      AS total_revenue
FROM events
GROUP BY tenant_id, date, event_type, country_code;

Key design decisions:

  • LowCardinality(String) for columns with <10K distinct values — 3–4x compression
  • Partition by (tenant_id, month) for multi-tenant data isolation and efficient pruning
  • ORDER BY matching common query patterns — data is sorted on disk, enabling O(log n) lookups
  • Materialized view for common aggregations — query hits the view, not the raw events table

☁️ Is Your Cloud Costing Too Much?

Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.

  • AWS, GCP, Azure certified engineers
  • Infrastructure as Code (Terraform, CDK)
  • Docker, Kubernetes, GitHub Actions CI/CD
  • Typical audit recovers $500–$3,000/month in savings

Kafka: Streaming Events to ClickHouse

Kafka Topic Setup

// src/kafka/producer.ts
import { Kafka, Partitioners, CompressionTypes } from 'kafkajs';
import { AnalyticsEvent } from '@/types/analytics';

const kafka = new Kafka({
  clientId: 'api-service',
  brokers: process.env.KAFKA_BROKERS?.split(',') ?? ['kafka:9092'],
  ssl: process.env.NODE_ENV === 'production',
  sasl: process.env.KAFKA_USERNAME ? {
    mechanism: 'scram-sha-256',
    username: process.env.KAFKA_USERNAME,
    password: process.env.KAFKA_PASSWORD!,
  } : undefined,
});

const producer = kafka.producer({
  createPartitioner: Partitioners.DefaultPartitioner,
  compression: CompressionTypes.Snappy, // Good balance of speed and ratio
  // Batch for throughput, but max 100ms delay
  batch: {
    size: 1024 * 1024, // 1MB
    lingerMs: 50,
  },
});

await producer.connect();

export async function trackEvent(event: AnalyticsEvent): Promise<void> {
  await producer.send({
    topic: 'analytics-events',
    messages: [
      {
        key: event.tenantId, // Same tenant → same partition → ordered
        value: JSON.stringify({
          ...event,
          timestamp: new Date().toISOString(),
          schema_version: 'v2',
        }),
        headers: {
          'event-type': event.eventType,
          'schema-version': 'v2',
        },
      },
    ],
  });
}

// Non-blocking fire-and-forget for high-throughput paths
export function trackEventAsync(event: AnalyticsEvent): void {
  trackEvent(event).catch((err) => {
    logger.error({ err, event }, 'Failed to track event');
    // Fallback: write to local buffer for later replay
    eventBuffer.push(event);
  });
}

ClickHouse Kafka Engine (Native Integration)

ClickHouse has a native Kafka engine — no separate consumer process needed:

-- 1. Kafka engine table (reads from topic)
CREATE TABLE events_kafka
(
    event_id   String,
    tenant_id  String,
    user_id    String,
    session_id String,
    event_type String,
    properties String,
    revenue    Float64,
    timestamp  String,
    page_path  String,
    referrer   String,
    country_code String,
    device_type String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'analytics-events',
    kafka_group_name = 'clickhouse-consumer',
    kafka_format = 'JSONEachRow',
    kafka_num_consumers = 4,
    kafka_skip_broken_messages = 100;

-- 2. Materialized view: moves data from Kafka table → events table
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT
    toUUID(event_id)              AS event_id,
    tenant_id,
    user_id,
    session_id,
    event_type,
    properties,
    toDecimal64(revenue, 2)       AS revenue,
    parseDateTime64BestEffort(timestamp, 3) AS timestamp,
    page_path,
    referrer,
    country_code,
    device_type
FROM events_kafka
WHERE length(tenant_id) > 0;  -- Filter malformed rows

This is powerful: the Kafka engine handles consumer group management, offset commits, and batching. The MV transforms and inserts. Zero application code for the consumer.


Node.js Analytics API

// src/analytics/queries.ts
import { createClient, ClickHouseClient } from '@clickhouse/client';

const ch: ClickHouseClient = createClient({
  host: process.env.CLICKHOUSE_HOST ?? 'http://clickhouse:8123',
  username: 'analytics',
  password: process.env.CLICKHOUSE_PASSWORD,
  database: 'analytics',
  clickhouse_settings: {
    async_insert: 1,
    wait_for_async_insert: 0,
    max_execution_time: 30,
  },
});

interface DashboardMetrics {
  totalRevenue: number;
  uniqueUsers: number;
  eventBreakdown: { eventType: string; count: number; revenue: number }[];
  dailyTrend: { date: string; revenue: number; users: number }[];
}

export async function getDashboardMetrics(
  tenantId: string,
  days: number = 30,
): Promise<DashboardMetrics> {
  const query = `
    SELECT
      sum(total_revenue)        AS total_revenue,
      sum(unique_users)         AS unique_users,
      groupArray((event_type, toUInt64(event_count), total_revenue)) AS breakdown
    FROM daily_stats
    WHERE
      tenant_id = {tenantId: String}
      AND date >= today() - {days: UInt32}
  `;

  const result = await ch.query({
    query,
    query_params: { tenantId, days },
    format: 'JSONEachRow',
  });

  const rows = await result.json<{
    total_revenue: string;
    unique_users: string;
    breakdown: Array<[string, string, string]>;
  }>();

  const row = rows[0];

  // Daily trend query
  const trendResult = await ch.query({
    query: `
      SELECT
        date,
        sum(total_revenue) AS revenue,
        sum(unique_users)  AS users
      FROM daily_stats
      WHERE tenant_id = {tenantId: String}
        AND date >= today() - {days: UInt32}
      GROUP BY date
      ORDER BY date ASC
    `,
    query_params: { tenantId, days },
    format: 'JSONEachRow',
  });

  const trend = await trendResult.json<{ date: string; revenue: string; users: string }>();

  return {
    totalRevenue: parseFloat(row.total_revenue),
    uniqueUsers: parseInt(row.unique_users),
    eventBreakdown: row.breakdown.map(([eventType, count, revenue]) => ({
      eventType,
      count: parseInt(count),
      revenue: parseFloat(revenue),
    })),
    dailyTrend: trend.map((t) => ({
      date: t.date,
      revenue: parseFloat(t.revenue),
      users: parseInt(t.users),
    })),
  };
}

⚙️ DevOps Done Right — Zero Downtime, Full Automation

Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.

  • Staging + production environments with feature flags
  • Automated security scanning in the pipeline
  • Uptime monitoring + alerting + runbook automation
  • On-call support handover docs included

Funnel Analysis

Funnel analysis (conversion rates through a defined sequence of events) is ClickHouse's specialty:

-- Calculate signup → onboarding → purchase funnel
SELECT
    tenant_id,
    countDistinct(user_id)                                          AS step1_signup,
    countDistinctIf(user_id, event_type = 'onboarding_complete')    AS step2_onboarding,
    countDistinctIf(user_id, event_type = 'first_purchase')         AS step3_purchase,

    -- Conversion rates
    step2_onboarding / step1_signup * 100                           AS signup_to_onboarding_pct,
    step3_purchase   / step2_onboarding * 100                       AS onboarding_to_purchase_pct

FROM events
WHERE
    tenant_id = 'tenant-abc'
    AND date >= today() - 30
    AND event_type IN ('signup', 'onboarding_complete', 'first_purchase')
GROUP BY tenant_id;
-- Ordered funnel with time constraints (must complete steps within 7 days)
SELECT
    level,
    count() AS users
FROM (
    SELECT
        user_id,
        windowFunnel(7 * 24 * 3600)(  -- 7 day window in seconds
            toUnixTimestamp(timestamp),
            event_type = 'signup',
            event_type = 'onboarding_complete',
            event_type = 'first_purchase'
        ) AS level
    FROM events
    WHERE tenant_id = 'tenant-abc' AND date >= today() - 30
    GROUP BY user_id
)
GROUP BY level
ORDER BY level;

Retention Cohort Analysis

-- Weekly retention cohorts
SELECT
    cohort_week,
    week_number,
    users_in_cohort,
    retained_users,
    retained_users / users_in_cohort * 100 AS retention_pct
FROM (
    SELECT
        toStartOfWeek(first_seen)    AS cohort_week,
        dateDiff('week', first_seen, activity_week) AS week_number,
        count(DISTINCT cohort.user_id)   AS users_in_cohort,
        count(DISTINCT activity.user_id) AS retained_users
    FROM (
        -- First activity per user
        SELECT user_id, min(date) AS first_seen
        FROM events
        WHERE tenant_id = 'tenant-abc' AND event_type = 'signup'
        GROUP BY user_id
    ) cohort
    JOIN (
        SELECT user_id, toStartOfWeek(date) AS activity_week
        FROM events
        WHERE tenant_id = 'tenant-abc'
        GROUP BY user_id, activity_week
    ) activity ON cohort.user_id = activity.user_id
    WHERE week_number BETWEEN 0 AND 8
    GROUP BY cohort_week, week_number
)
ORDER BY cohort_week, week_number;

Infrastructure Cost Estimates

Scale (events/month)ClickHouse SetupKafka SetupMonthly Cost
<100Mc6i.xlarge (4 vCPU, 8GB)MSK kafka.t3.small$150–$300
100M–1B3-node cluster c6i.2xlargeMSK kafka.m5.large (3 broker)$600–$1,200
1B–10B6-node cluster c6i.4xlargeMSK kafka.m5.xlarge (6 broker)$2,500–$5,000
>10BClickHouse Cloud / AltinityConfluent Cloud$8,000–$25,000

Cost optimization:

  • Use S3-backed ClickHouse tiered storage for cold data (90% cheaper)
  • Enable Kafka log compaction on CDC topics
  • Compress with Snappy/LZ4 (50–70% size reduction)
  • TTL policies to auto-expire old data

Working With Viprasol

Our data engineering team builds production analytics pipelines for SaaS products — from initial ClickHouse setup to real-time dashboards that handle billion-event scale.

What we deliver:

  • ClickHouse cluster design (schema, partitioning, materialized views)
  • Kafka topic architecture and producer/consumer patterns
  • Multi-tenant analytics APIs with row-level isolation
  • Grafana or custom React dashboard implementations
  • Migration from PostgreSQL to ClickHouse with zero downtime

Talk about your analytics requirementsAI and data engineering services


Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need DevOps & Cloud Expertise?

Scale your infrastructure with confidence. AWS, GCP, Azure certified team.

Free consultation • No commitment • Response within 24 hours

Viprasol · Big Data & Analytics

Making sense of your data at scale?

Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.