SaaS Analytics Architecture: Clickstream, Cohort Analysis, and Self-Serve Dashboards
Build a production SaaS analytics stack: implement clickstream collection with event schemas, design cohort retention tables in PostgreSQL or BigQuery, build funnel analysis queries, and deliver self-serve BI dashboards.
Most SaaS companies start analytics the wrong way: they drop an analytics SDK into the frontend, watch events appear in a dashboard, and call it done. Then the questions start arriving โ "What's our D30 retention for users who completed onboarding?" โ and the answer is "we'd need to build that."
A proper analytics architecture starts with the questions you need to answer, then works backward to the schema design, collection pipeline, and dashboard layer.
The Analytics Architecture Stack
Collection Layer: Client events โ server events โ webhook events
Transport Layer: Event queue (Kafka/SQS) or direct write
Storage Layer: Raw events โ transformed tables โ aggregations
Query Layer: SQL + dbt models + cached materialized views
Presentation Layer: BI tool (Metabase/Superset) or custom dashboards
Event Schema Design
The most important architectural decision is your event schema. A poor schema is expensive to fix later.
// src/analytics/event-schemas.ts
// Events follow the noun-verb pattern: object_action
// Base event โ every event has these fields
interface BaseEvent {
event_id: string; // UUID, for deduplication
event_type: string; // "user_signed_up", "project_created", etc.
occurred_at: string; // ISO 8601, when the event happened (not when received)
received_at: string; // ISO 8601, when the server received it
session_id: string; // Groups events within a session
user_id: string | null; // Null for anonymous events
anonymous_id: string; // Always present, even before login
tenant_id: string | null;
properties: Record<string, unknown>; // Event-specific data
context: EventContext;
}
interface EventContext {
page: {
url: string;
path: string;
referrer: string;
title: string;
};
user_agent: string;
ip: string; // Used for geo enrichment, then hashed for privacy
library: {
name: string;
version: string;
};
campaign: {
source: string | null; // UTM source
medium: string | null;
name: string | null;
term: string | null;
content: string | null;
};
}
// Typed event definitions
type UserSignedUpEvent = BaseEvent & {
event_type: "user_signed_up";
properties: {
email: string;
plan: "free" | "starter" | "growth";
signup_method: "email" | "google" | "github";
referral_code: string | null;
time_to_signup_seconds: number; // Time from landing to signup
};
};
type ProjectCreatedEvent = BaseEvent & {
event_type: "project_created";
properties: {
project_id: string;
project_type: string;
is_first_project: boolean;
template_used: string | null;
};
};
type SubscriptionUpgradedEvent = BaseEvent & {
event_type: "subscription_upgraded";
properties: {
from_plan: string;
to_plan: string;
mrr_delta_usd: number; // Change in monthly recurring revenue
days_since_signup: number;
trigger: "organic" | "in-app-prompt" | "sales-outreach";
};
};
Server-Side Event Collection
// src/analytics/collector.ts
import { SQSClient, SendMessageCommand } from "@aws-sdk/client-sqs";
import { v4 as uuidv4 } from "uuid";
const sqs = new SQSClient({ region: "us-east-1" });
export class EventCollector {
private readonly queueUrl: string;
constructor(queueUrl: string) {
this.queueUrl = queueUrl;
}
async track<T extends BaseEvent>(
event: Omit<T, "event_id" | "received_at">
): Promise<void> {
const enrichedEvent: T = {
...event,
event_id: uuidv4(),
received_at: new Date().toISOString(),
} as T;
// Critical business events: always server-side to avoid ad-blocker losses
// (signup, payment, upgrade events must be 100% captured)
await sqs.send(
new SendMessageCommand({
QueueUrl: this.queueUrl,
MessageBody: JSON.stringify(enrichedEvent),
MessageGroupId: enrichedEvent.user_id ?? enrichedEvent.anonymous_id,
MessageDeduplicationId: enrichedEvent.event_id,
})
);
}
}
// Middleware: auto-track page views and identify users
export async function analyticsMiddleware(
req: Request,
res: Response,
next: NextFunction
) {
const session = await getSession(req);
// Attach collector to request for use in route handlers
req.analytics = new EventCollector(process.env.ANALYTICS_QUEUE_URL!);
req.analytics.userId = session?.user?.id ?? null;
req.analytics.anonymousId = req.cookies["anon_id"] ?? uuidv4();
req.analytics.tenantId = session?.user?.tenantId ?? null;
next();
}
๐ SaaS MVP in 8 Weeks โ Seriously
We have launched 50+ SaaS platforms. Multi-tenant architecture, Stripe billing, auth, role-based access, and cloud deployment โ all handled by one senior team.
- Week 1โ2: Architecture design + wireframes
- Week 3โ6: Core features built + tested
- Week 7โ8: Launch-ready on AWS/Vercel with CI/CD
- Post-launch: Maintenance plans from month 3
Raw Events Storage Schema
-- Raw events table โ append-only, never update
-- Partitioned by date for query performance
CREATE TABLE events (
event_id UUID NOT NULL,
event_type TEXT NOT NULL,
occurred_at TIMESTAMPTZ NOT NULL,
received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
user_id UUID,
anonymous_id TEXT NOT NULL,
tenant_id UUID,
session_id TEXT,
properties JSONB NOT NULL DEFAULT '{}',
context JSONB NOT NULL DEFAULT '{}',
-- Enriched fields (added by ETL pipeline)
country_code CHAR(2),
city TEXT,
device_type TEXT, -- 'mobile' | 'tablet' | 'desktop'
browser TEXT,
os TEXT
) PARTITION BY RANGE (occurred_at);
-- Create monthly partitions (automate this with pg_partman)
CREATE TABLE events_2026_09
PARTITION OF events
FOR VALUES FROM ('2026-09-01') TO ('2026-10-01');
-- Indexes on common query patterns
CREATE INDEX ON events (user_id, occurred_at DESC);
CREATE INDEX ON events (tenant_id, occurred_at DESC);
CREATE INDEX ON events (event_type, occurred_at DESC);
CREATE INDEX ON events USING GIN (properties); -- For JSONB property queries
Cohort Retention Analysis
Cohort retention is the most important metric for understanding product health. The question: of users who signed up in week W, what % are still active in week W+N?
-- Build a retention cohort table
-- "Active" = performed any key action in that week
WITH cohorts AS (
-- Assign each user to their signup cohort (weekly buckets)
SELECT
user_id,
DATE_TRUNC('week', occurred_at)::DATE AS cohort_week
FROM events
WHERE event_type = 'user_signed_up'
AND occurred_at >= NOW() - INTERVAL '16 weeks'
),
weekly_activity AS (
-- Find every week each user was active (performed any key event)
SELECT DISTINCT
e.user_id,
DATE_TRUNC('week', e.occurred_at)::DATE AS activity_week
FROM events e
WHERE e.event_type IN (
'project_created', 'export_completed', 'collaborator_invited',
'integration_connected', 'report_viewed'
)
AND e.occurred_at >= NOW() - INTERVAL '16 weeks'
),
retention_data AS (
SELECT
c.cohort_week,
wa.activity_week,
-- Week number since signup (0 = signup week, 1 = first full week after, etc.)
ROUND(
EXTRACT(EPOCH FROM (wa.activity_week - c.cohort_week)) / (7 * 86400)
)::INT AS week_number,
COUNT(DISTINCT wa.user_id) AS active_users
FROM cohorts c
JOIN weekly_activity wa ON wa.user_id = c.user_id
AND wa.activity_week >= c.cohort_week
GROUP BY c.cohort_week, wa.activity_week
),
cohort_sizes AS (
SELECT cohort_week, COUNT(*) AS cohort_size
FROM cohorts
GROUP BY cohort_week
)
SELECT
rd.cohort_week,
cs.cohort_size,
rd.week_number,
rd.active_users,
ROUND(100.0 * rd.active_users / cs.cohort_size, 1) AS retention_pct
FROM retention_data rd
JOIN cohort_sizes cs ON cs.cohort_week = rd.cohort_week
ORDER BY rd.cohort_week, rd.week_number;
-- Result format (pivot for heatmap display):
-- cohort_week | cohort_size | week_0 | week_1 | week_2 | ... | week_12
-- 2026-07-07 | 412 | 100% | 68% | 52% | ... | 28%
-- 2026-07-14 | 389 | 100% | 71% | 55% | ... | --
๐ก The Difference Between a SaaS Demo and a SaaS Business
Anyone can build a demo. We build SaaS products that handle real load, real users, and real payments โ with architecture that does not need to be rewritten at 1,000 users.
- Multi-tenant PostgreSQL with row-level security
- Stripe subscriptions, usage billing, annual plans
- SOC2-ready infrastructure from day one
- We own zero equity โ you own everything
Funnel Analysis
-- Conversion funnel: signup โ first project โ first export โ upgrade
-- Shows drop-off at each step
WITH funnel_base AS (
SELECT user_id, MIN(occurred_at) AS signed_up_at
FROM events
WHERE event_type = 'user_signed_up'
AND occurred_at >= NOW() - INTERVAL '30 days'
GROUP BY user_id
),
step2 AS (
SELECT DISTINCT e.user_id
FROM events e
JOIN funnel_base f ON f.user_id = e.user_id
WHERE e.event_type = 'project_created'
AND e.occurred_at > f.signed_up_at
AND e.occurred_at < f.signed_up_at + INTERVAL '7 days'
),
step3 AS (
SELECT DISTINCT e.user_id
FROM events e
JOIN step2 s ON s.user_id = e.user_id
WHERE e.event_type = 'export_completed'
AND e.occurred_at >= (
SELECT MIN(occurred_at) FROM events
WHERE user_id = e.user_id AND event_type = 'project_created'
)
AND e.occurred_at < (
SELECT MIN(occurred_at) FROM events
WHERE user_id = e.user_id AND event_type = 'user_signed_up'
) + INTERVAL '14 days'
),
step4 AS (
SELECT DISTINCT e.user_id
FROM events e
JOIN step3 s ON s.user_id = e.user_id
WHERE e.event_type = 'subscription_upgraded'
AND e.occurred_at >= (
SELECT MIN(occurred_at) FROM events
WHERE user_id = e.user_id AND event_type = 'user_signed_up'
)
)
SELECT
'signed_up' AS step, COUNT(*) AS users, 100.0 AS pct FROM funnel_base
UNION ALL
SELECT
'created_project', COUNT(*), ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM funnel_base), 1)
FROM step2
UNION ALL
SELECT
'completed_export', COUNT(*), ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM funnel_base), 1)
FROM step3
UNION ALL
SELECT
'upgraded', COUNT(*), ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM funnel_base), 1)
FROM step4
ORDER BY step;
Self-Serve Dashboard Architecture
// src/analytics/dashboard.service.ts
// Serve analytics data to frontend dashboards with caching
interface DashboardQuery {
metric: "cohort_retention" | "funnel" | "dau_mau" | "mrr_growth";
tenantId?: string; // null = platform-wide (admin only)
period: "7d" | "30d" | "90d";
granularity: "day" | "week" | "month";
}
export class DashboardService {
constructor(
private readonly db: Pool,
private readonly cache: Redis
) {}
async query(params: DashboardQuery): Promise<unknown> {
const cacheKey = `analytics:${JSON.stringify(params)}`;
// Cache dashboard queries โ they're expensive and slightly stale is fine
const cached = await this.cache.get(cacheKey);
if (cached) return JSON.parse(cached);
const result = await this.executeQuery(params);
// Cache for 15 minutes (adjust based on data freshness needs)
await this.cache.setex(cacheKey, 900, JSON.stringify(result));
return result;
}
private async executeQuery(params: DashboardQuery): Promise<unknown> {
switch (params.metric) {
case "dau_mau":
return this.getDauMauRatio(params);
case "mrr_growth":
return this.getMrrGrowth(params);
case "funnel":
return this.getFunnelData(params);
default:
throw new Error(`Unknown metric: ${params.metric}`);
}
}
private async getDauMauRatio(params: DashboardQuery) {
const { rows } = await this.db.query(`
WITH daily_active AS (
SELECT
DATE_TRUNC('day', occurred_at)::DATE AS date,
COUNT(DISTINCT user_id) AS dau
FROM events
WHERE occurred_at >= NOW() - INTERVAL '${params.period}'
${params.tenantId ? "AND tenant_id = $1" : ""}
GROUP BY 1
),
monthly_active AS (
SELECT
DATE_TRUNC('month', occurred_at)::DATE AS month,
COUNT(DISTINCT user_id) AS mau
FROM events
WHERE occurred_at >= NOW() - INTERVAL '${params.period}'
${params.tenantId ? "AND tenant_id = $1" : ""}
GROUP BY 1
)
SELECT
d.date,
d.dau,
m.mau,
ROUND(100.0 * d.dau / NULLIF(m.mau, 0), 1) AS dau_mau_ratio
FROM daily_active d
JOIN monthly_active m ON DATE_TRUNC('month', d.date) = m.month
ORDER BY d.date DESC
`, params.tenantId ? [params.tenantId] : []);
return rows;
}
}
BigQuery for Scale
When event volume exceeds ~100M rows/month, move analytics storage to BigQuery:
-- BigQuery schema (uses STRUCT for nested data)
CREATE TABLE IF NOT EXISTS `project.analytics.events` (
event_id STRING NOT NULL,
event_type STRING NOT NULL,
occurred_at TIMESTAMP NOT NULL,
user_id STRING,
tenant_id STRING,
properties JSON,
context STRUCT<
page STRUCT<
url STRING,
path STRING,
referrer STRING
>,
campaign STRUCT<
source STRING,
medium STRING,
name STRING
>,
device_type STRING,
country STRING
>
)
PARTITION BY DATE(occurred_at)
CLUSTER BY event_type, tenant_id; -- Cluster for common filter patterns
-- BigQuery retention query (same logic, BigQuery syntax)
SELECT
cohort_week,
week_number,
active_users,
cohort_size,
ROUND(100.0 * active_users / cohort_size, 1) AS retention_pct
FROM (
SELECT
c.cohort_week,
TIMESTAMP_DIFF(
TIMESTAMP_TRUNC(wa.activity_week, WEEK),
TIMESTAMP_TRUNC(c.cohort_week, WEEK),
WEEK
) AS week_number,
COUNT(DISTINCT wa.user_id) AS active_users,
MAX(cs.cohort_size) AS cohort_size
FROM `project.analytics.cohorts` c
JOIN `project.analytics.weekly_activity` wa ON wa.user_id = c.user_id
JOIN `project.analytics.cohort_sizes` cs ON cs.cohort_week = c.cohort_week
GROUP BY 1, 2
)
ORDER BY cohort_week, week_number;
See Also
- Product-Led Growth Engineering โ activation funnel
- SaaS Metrics Benchmarks โ what good looks like
- Data Pipeline Architecture โ ETL/ELT pipelines
- PostgreSQL Performance Tuning โ query optimization
Working With Viprasol
Analytics architecture is more than dropping an SDK into your app. We design event schemas that answer the questions you'll need 12 months from now, build collection pipelines that don't miss critical business events, and deliver self-serve dashboards that your team actually uses to make decisions.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Building a SaaS Product?
We've helped launch 50+ SaaS platforms. Let's build yours โ fast.
Free consultation โข No commitment โข Response within 24 hours
Add AI automation to your SaaS product?
Viprasol builds custom AI agent crews that plug into any SaaS workflow โ automating repetitive tasks, qualifying leads, and responding across every channel your customers use.