SaaS Data Warehouse in 2026: dbt, Snowflake vs BigQuery vs Redshift, and ELT Pipelines
Build a SaaS data warehouse: dbt model architecture, Snowflake vs BigQuery vs Redshift comparison, Fivetran/Airbyte ELT pipelines, and data quality testing for analytics.
SaaS Data Warehouse in 2026: dbt, Snowflake vs BigQuery vs Redshift, and ELT Pipelines
Most SaaS companies outgrow Postgres as their analytics database somewhere between $1M and $5M ARR. The combination of complex join-heavy queries, growing data volume, and the need to blend data from multiple sources (Stripe, Salesforce, your app DB, Intercom) makes a dedicated data warehouse essential.
The modern data stack in 2026: Fivetran or Airbyte extracts from your sources and loads raw data into Snowflake, BigQuery, or Redshift. dbt transforms that raw data into clean, documented, tested analytics models. BI tools (Metabase, Looker, Superset) query the dbt models. This post covers each layer with production patterns.
Warehouse Comparison: Snowflake vs BigQuery vs Redshift
| Snowflake | BigQuery | Redshift Serverless | |
|---|---|---|---|
| Pricing model | Credits per second (compute) + storage | Per TB scanned (on-demand) or flat reservation | Per RPU-hour + storage |
| Scaling | Instant virtual warehouse resize | Automatic (serverless) | Automatic (serverless v2) |
| Concurrency | Multi-cluster warehouses | Unlimited slots (reservation) | Managed automatically |
| Time travel | 90 days (Enterprise) | 7 days | N/A (snapshots) |
| Semi-structured data | Excellent (VARIANT type, FLATTEN) | Excellent (REPEATED/RECORD types) | Good (SUPER type) |
| dbt support | โ Best-in-class | โ Excellent | โ Good |
| GCP lock-in | Multi-cloud | GCP only | AWS only |
| Cost at $10K/month data | ~$800/month | ~$500/month | ~$600/month |
| Cost at $100K/month data | ~$4,000/month | ~$3,000/month | ~$3,500/month |
Recommendation by use case:
- Already on AWS, existing Redshift: stick with Redshift Serverless
- Already on GCP / BigQuery ecosystem: BigQuery
- Multi-cloud or greenfield: Snowflake (best dbt integration, time travel, data sharing)
ELT Pipeline: Fivetran vs Airbyte
| Fivetran | Airbyte | |
|---|---|---|
| Pricing | Per Monthly Active Row (~$500โ$2K/month typical) | Open source (free) or Airbyte Cloud |
| Connector quality | Best-in-class (Stripe, Salesforce, HubSpot) | Good (600+ connectors) |
| Maintenance | Zero โ Fivetran maintains schema changes | You maintain custom connectors |
| Custom sources | Python SDK | Custom connector framework |
| When to choose | When budget allows; zero infra overhead | When cost matters more; technical team |
Fivetran: Connecting Stripe to Snowflake
# fivetran_config.yaml (declarative connector config)
# Configured in Fivetran dashboard โ no YAML needed for standard connectors
# This represents the configuration options:
connector:
service: stripe
schema: stripe_raw
destination: snowflake_prod
sync_frequency: 360 # minutes (6 hours for most SaaS data)
tables:
- invoices
- customers
- subscriptions
- charges
- refunds
- disputes
- products
- prices
# Result in Snowflake:
# stripe_raw.invoices, stripe_raw.customers, etc.
# All columns from Stripe API, including nested JSON expanded to columns
Airbyte: Custom Source Connector
# airbyte_connectors/source-internal-api/source.py
from airbyte_cdk.sources import AbstractSource
from airbyte_cdk.models import AirbyteStream, SyncMode
from airbyte_cdk.sources.streams.http import HttpStream
import requests
class UsersStream(HttpStream):
url_base = "https://api.myapp.com/internal"
primary_key = "id"
def path(self, **kwargs) -> str:
return "users"
def next_page_token(self, response: requests.Response) -> Optional[Mapping]:
data = response.json()
if data["pagination"]["hasNextPage"]:
return {"page": data["pagination"]["page"] + 1}
return None
def request_params(self, next_page_token=None, **kwargs) -> Mapping:
params = {"perPage": 500}
if next_page_token:
params["page"] = next_page_token["page"]
return params
def parse_response(self, response: requests.Response, **kwargs):
yield from response.json()["data"]
def get_json_schema(self) -> Mapping:
return {
"type": "object",
"properties": {
"id": {"type": "string"},
"email": {"type": "string"},
"plan": {"type": "string"},
"createdAt": {"type": "string", "format": "date-time"},
"mrr": {"type": "number"},
},
}
class SourceInternalAPI(AbstractSource):
def check_connection(self, logger, config) -> Tuple[bool, Optional[Any]]:
try:
resp = requests.get(
f"{config['api_url']}/healthz",
headers={"Authorization": f"Bearer {config['api_key']}"},
timeout=5,
)
return resp.ok, None
except Exception as e:
return False, str(e)
def streams(self, config) -> List[Stream]:
return [UsersStream(authenticator=...)]
๐ค AI Is Not the Future โ It Is Right Now
Businesses using AI automation cut manual work by 60โ80%. We build production-ready AI systems โ RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.
- LLM integration (OpenAI, Anthropic, Gemini, local models)
- RAG systems that answer from your own data
- AI agents that take real actions โ not just chat
- Custom ML models for prediction, classification, detection
dbt Project Structure
models/
โโโ staging/ โ 1:1 with source tables; rename columns, cast types, no joins
โ โโโ stripe/
โ โ โโโ stg_stripe__customers.sql
โ โ โโโ stg_stripe__invoices.sql
โ โ โโโ stg_stripe__subscriptions.sql
โ โโโ app/
โ โโโ stg_app__users.sql
โ โโโ stg_app__orders.sql
โโโ intermediate/ โ Joins and business logic; not exposed to BI
โ โโโ int_subscriptions_enriched.sql
โ โโโ int_revenue_events.sql
โโโ marts/ โ Final models for BI; organized by business domain
โโโ finance/
โ โโโ fct_revenue.sql โ Fact table: one row per revenue event
โ โโโ dim_customers.sql โ Dimension table: one row per customer
โโโ product/
โโโ fct_events.sql
โโโ fct_active_users.sql
Staging Model
-- models/staging/stripe/stg_stripe__subscriptions.sql
-- Rename, cast, and filter raw Stripe data. No business logic.
WITH source AS (
SELECT * FROM {{ source('stripe_raw', 'subscriptions') }}
),
renamed AS (
SELECT
id AS subscription_id,
customer AS stripe_customer_id,
status,
plan:id::TEXT AS plan_id, -- Snowflake VARIANT
plan:interval::TEXT AS billing_interval,
plan:amount::INTEGER AS plan_amount_cents,
quantity,
CONVERT_TIMEZONE('UTC', created) AS created_at,
CONVERT_TIMEZONE('UTC', current_period_start) AS period_start,
CONVERT_TIMEZONE('UTC', current_period_end) AS period_end,
CONVERT_TIMEZONE('UTC', canceled_at) AS cancelled_at,
cancel_at_period_end,
metadata:internal_user_id::TEXT AS internal_user_id
FROM source
WHERE _fivetran_deleted = FALSE -- Fivetran soft-delete column
)
SELECT * FROM renamed
Mart: Monthly Recurring Revenue
-- models/marts/finance/fct_revenue.sql
-- One row per MRR event (new, expansion, contraction, churn)
WITH subscriptions AS (
SELECT * FROM {{ ref('stg_stripe__subscriptions') }}
),
monthly_mrr AS (
SELECT
DATE_TRUNC('month', period_start) AS month,
subscription_id,
stripe_customer_id,
internal_user_id,
-- Normalize to monthly MRR
CASE billing_interval
WHEN 'month' THEN plan_amount_cents * quantity
WHEN 'year' THEN plan_amount_cents * quantity / 12
ELSE plan_amount_cents * quantity
END AS mrr_cents,
status,
LAG(mrr_cents) OVER (
PARTITION BY subscription_id ORDER BY period_start
) AS prev_mrr_cents,
LAG(status) OVER (
PARTITION BY subscription_id ORDER BY period_start
) AS prev_status
FROM subscriptions
),
mrr_events AS (
SELECT
month,
subscription_id,
internal_user_id,
mrr_cents,
prev_mrr_cents,
CASE
WHEN prev_status IS NULL AND status = 'active'
THEN 'new'
WHEN prev_status = 'active' AND status IN ('canceled', 'unpaid')
THEN 'churn'
WHEN prev_status IN ('canceled', 'unpaid') AND status = 'active'
THEN 'reactivation'
WHEN mrr_cents > COALESCE(prev_mrr_cents, 0)
THEN 'expansion'
WHEN mrr_cents < COALESCE(prev_mrr_cents, 0)
THEN 'contraction'
ELSE 'unchanged'
END AS event_type,
mrr_cents - COALESCE(prev_mrr_cents, 0) AS mrr_delta_cents
FROM monthly_mrr
WHERE status != 'trialing' -- Trials don't count as revenue
)
SELECT
{{ dbt_utils.generate_surrogate_key(['month', 'subscription_id']) }} AS revenue_event_id,
month,
subscription_id,
internal_user_id,
event_type,
mrr_cents,
mrr_delta_cents,
mrr_cents / 100.0 AS mrr_dollars,
mrr_delta_cents / 100.0 AS mrr_delta_dollars,
mrr_cents * 12 / 100.0 AS arr_dollars
FROM mrr_events
WHERE event_type != 'unchanged'
dbt Tests for Data Quality
# models/marts/finance/fct_revenue.yml
version: 2
models:
- name: fct_revenue
description: One row per MRR-changing event per subscription per month
columns:
- name: revenue_event_id
tests:
- unique
- not_null
- name: event_type
tests:
- accepted_values:
values: ['new', 'expansion', 'contraction', 'churn', 'reactivation']
- name: mrr_cents
tests:
- not_null
# Custom test: MRR must be positive for active subs
- dbt_utils.expression_is_true:
expression: "mrr_cents > 0"
- name: internal_user_id
tests:
- relationships:
to: ref('stg_app__users')
field: user_id
# Allow nulls for Stripe customers without internal accounts
config: { severity: warn }
Cost Optimization: Warehouse + dbt
-- Snowflake: partition your large models on a date column to reduce scan costs
-- dbt config block in your model SQL
{{ config(
materialized='incremental',
cluster_by=['month'], -- Clustering key for scan pruning
incremental_strategy='merge',
unique_key='revenue_event_id',
on_schema_change='fail' -- Fail loudly on schema changes
) }}
-- Incremental logic: only process new data
{% if is_incremental() %}
WHERE period_start >= (SELECT MAX(month) FROM {{ this }}) - INTERVAL '2 months'
{% endif %}
โก Your Competitors Are Already Using AI โ Are You?
We build AI systems that actually work in production โ not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.
- AI agent systems that run autonomously โ not just chatbots
- Integrates with your existing tools (CRM, ERP, Slack, etc.)
- Explainable outputs โ know why the model decided what it did
- Free AI opportunity audit for your business
Working With Viprasol
We build modern data stacks for SaaS companies โ from warehouse selection and ELT pipeline setup through dbt model architecture and BI layer integration.
What we deliver:
- Warehouse selection and setup (Snowflake/BigQuery/Redshift)
- Fivetran or Airbyte pipeline configuration for Stripe, Salesforce, HubSpot
- dbt project structure with staging/intermediate/marts layers
- Revenue and SaaS metrics data models (MRR, NRR, cohort analysis)
- dbt testing strategy and data quality monitoring
โ Discuss your data warehouse needs โ AI and analytics services
See Also
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Want to Implement AI in Your Business?
From chatbots to predictive models โ harness the power of AI with a team that delivers.
Free consultation โข No commitment โข Response within 24 hours
Ready to automate your business with AI agents?
We build custom multi-agent AI systems that handle sales, support, ops, and content โ across Telegram, WhatsApp, Slack, and 20+ other platforms. We run our own business on these systems.