SaaS Data Warehouse in 2026: dbt, Snowflake vs BigQuery vs Redshift, and ELT Pipelines

Most SaaS companies outgrow Postgres as their analytics database somewhere between $1M and $5M ARR. The combination of complex join-heavy queries, growing data volume, and the need to blend data from multiple sources (Stripe, Salesforce, your app DB, Intercom) makes a dedicated data warehouse essential.

The modern data stack in 2026: Fivetran or Airbyte extracts from your sources and loads raw data into Snowflake, BigQuery, or Redshift. dbt transforms that raw data into clean, documented, tested analytics models. BI tools (Metabase, Looker, Superset) query the dbt models. This post covers each layer with production patterns.

Warehouse Comparison: Snowflake vs BigQuery vs Redshift

	Snowflake	BigQuery	Redshift Serverless
Pricing model	Credits per second (compute) + storage	Per TB scanned (on-demand) or flat reservation	Per RPU-hour + storage
Scaling	Instant virtual warehouse resize	Automatic (serverless)	Automatic (serverless v2)
Concurrency	Multi-cluster warehouses	Unlimited slots (reservation)	Managed automatically
Time travel	90 days (Enterprise)	7 days	N/A (snapshots)
Semi-structured data	Excellent (VARIANT type, FLATTEN)	Excellent (REPEATED/RECORD types)	Good (SUPER type)
dbt support	✅ Best-in-class	✅ Excellent	✅ Good
GCP lock-in	Multi-cloud	GCP only	AWS only
Cost at $10K/month data	~$800/month	~$500/month	~$600/month
Cost at $100K/month data	~$4,000/month	~$3,000/month	~$3,500/month

Recommendation by use case:

Already on AWS, existing Redshift: stick with Redshift Serverless
Already on GCP / BigQuery ecosystem: BigQuery
Multi-cloud or greenfield: Snowflake (best dbt integration, time travel, data sharing)

ELT Pipeline: Fivetran vs Airbyte

	Fivetran	Airbyte
Pricing	Per Monthly Active Row (~$500–$2K/month typical)	Open source (free) or Airbyte Cloud
Connector quality	Best-in-class (Stripe, Salesforce, HubSpot)	Good (600+ connectors)
Maintenance	Zero — Fivetran maintains schema changes	You maintain custom connectors
Custom sources	Python SDK	Custom connector framework
When to choose	When budget allows; zero infra overhead	When cost matters more; technical team

Fivetran: Connecting Stripe to Snowflake

# fivetran_config.yaml (declarative connector config)
# Configured in Fivetran dashboard — no YAML needed for standard connectors
# This represents the configuration options:

connector:
  service: stripe
  schema: stripe_raw
  destination: snowflake_prod
  sync_frequency: 360  # minutes (6 hours for most SaaS data)
  
  tables:
    - invoices
    - customers
    - subscriptions
    - charges
    - refunds
    - disputes
    - products
    - prices

# Result in Snowflake:
# stripe_raw.invoices, stripe_raw.customers, etc.
# All columns from Stripe API, including nested JSON expanded to columns

Airbyte: Custom Source Connector

# airbyte_connectors/source-internal-api/source.py
from airbyte_cdk.sources import AbstractSource
from airbyte_cdk.models import AirbyteStream, SyncMode
from airbyte_cdk.sources.streams.http import HttpStream
import requests

class UsersStream(HttpStream):
    url_base = "https://api.myapp.com/internal"
    primary_key = "id"

    def path(self, **kwargs) -> str:
        return "users"

    def next_page_token(self, response: requests.Response) -> Optional[Mapping]:
        data = response.json()
        if data["pagination"]["hasNextPage"]:
            return {"page": data["pagination"]["page"] + 1}
        return None

    def request_params(self, next_page_token=None, **kwargs) -> Mapping:
        params = {"perPage": 500}
        if next_page_token:
            params["page"] = next_page_token["page"]
        return params

    def parse_response(self, response: requests.Response, **kwargs):
        yield from response.json()["data"]

    def get_json_schema(self) -> Mapping:
        return {
            "type": "object",
            "properties": {
                "id":         {"type": "string"},
                "email":      {"type": "string"},
                "plan":       {"type": "string"},
                "createdAt":  {"type": "string", "format": "date-time"},
                "mrr":        {"type": "number"},
            },
        }

class SourceInternalAPI(AbstractSource):
    def check_connection(self, logger, config) -> Tuple[bool, Optional[Any]]:
        try:
            resp = requests.get(
                f"{config['api_url']}/healthz",
                headers={"Authorization": f"Bearer {config['api_key']}"},
                timeout=5,
            )
            return resp.ok, None
        except Exception as e:
            return False, str(e)

    def streams(self, config) -> List[Stream]:
        return [UsersStream(authenticator=...)]

🤖 AI Is Not the Future — It Is Right Now

Businesses using AI automation cut manual work by 60–80%. We build production-ready AI systems — RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.

LLM integration (OpenAI, Anthropic, Gemini, local models)
RAG systems that answer from your own data
AI agents that take real actions — not just chat
Custom ML models for prediction, classification, detection

Explore AI for My Business WhatsApp

dbt Project Structure

models/
├── staging/          ← 1:1 with source tables; rename columns, cast types, no joins
│   ├── stripe/
│   │   ├── stg_stripe__customers.sql
│   │   ├── stg_stripe__invoices.sql
│   │   └── stg_stripe__subscriptions.sql
│   └── app/
│       ├── stg_app__users.sql
│       └── stg_app__orders.sql
├── intermediate/     ← Joins and business logic; not exposed to BI
│   ├── int_subscriptions_enriched.sql
│   └── int_revenue_events.sql
└── marts/            ← Final models for BI; organized by business domain
    ├── finance/
    │   ├── fct_revenue.sql       ← Fact table: one row per revenue event
    │   └── dim_customers.sql     ← Dimension table: one row per customer
    └── product/
        ├── fct_events.sql
        └── fct_active_users.sql

Staging Model

-- models/staging/stripe/stg_stripe__subscriptions.sql
-- Rename, cast, and filter raw Stripe data. No business logic.

WITH source AS (
    SELECT * FROM {{ source('stripe_raw', 'subscriptions') }}
),

renamed AS (
    SELECT
        id                                          AS subscription_id,
        customer                                    AS stripe_customer_id,
        status,
        plan:id::TEXT                               AS plan_id,         -- Snowflake VARIANT
        plan:interval::TEXT                         AS billing_interval,
        plan:amount::INTEGER                        AS plan_amount_cents,
        quantity,
        CONVERT_TIMEZONE('UTC', created)            AS created_at,
        CONVERT_TIMEZONE('UTC', current_period_start) AS period_start,
        CONVERT_TIMEZONE('UTC', current_period_end)   AS period_end,
        CONVERT_TIMEZONE('UTC', canceled_at)          AS cancelled_at,
        cancel_at_period_end,
        metadata:internal_user_id::TEXT             AS internal_user_id
    FROM source
    WHERE _fivetran_deleted = FALSE   -- Fivetran soft-delete column
)

SELECT * FROM renamed

Mart: Monthly Recurring Revenue

-- models/marts/finance/fct_revenue.sql
-- One row per MRR event (new, expansion, contraction, churn)

WITH subscriptions AS (
    SELECT * FROM {{ ref('stg_stripe__subscriptions') }}
),

monthly_mrr AS (
    SELECT
        DATE_TRUNC('month', period_start) AS month,
        subscription_id,
        stripe_customer_id,
        internal_user_id,
        -- Normalize to monthly MRR
        CASE billing_interval
            WHEN 'month' THEN plan_amount_cents * quantity
            WHEN 'year'  THEN plan_amount_cents * quantity / 12
            ELSE plan_amount_cents * quantity
        END AS mrr_cents,
        status,
        LAG(mrr_cents) OVER (
            PARTITION BY subscription_id ORDER BY period_start
        ) AS prev_mrr_cents,
        LAG(status) OVER (
            PARTITION BY subscription_id ORDER BY period_start
        ) AS prev_status
    FROM subscriptions
),

mrr_events AS (
    SELECT
        month,
        subscription_id,
        internal_user_id,
        mrr_cents,
        prev_mrr_cents,
        CASE
            WHEN prev_status IS NULL AND status = 'active'
                THEN 'new'
            WHEN prev_status = 'active' AND status IN ('canceled', 'unpaid')
                THEN 'churn'
            WHEN prev_status IN ('canceled', 'unpaid') AND status = 'active'
                THEN 'reactivation'
            WHEN mrr_cents > COALESCE(prev_mrr_cents, 0)
                THEN 'expansion'
            WHEN mrr_cents < COALESCE(prev_mrr_cents, 0)
                THEN 'contraction'
            ELSE 'unchanged'
        END AS event_type,
        mrr_cents - COALESCE(prev_mrr_cents, 0) AS mrr_delta_cents
    FROM monthly_mrr
    WHERE status != 'trialing'  -- Trials don't count as revenue
)

SELECT
    {{ dbt_utils.generate_surrogate_key(['month', 'subscription_id']) }} AS revenue_event_id,
    month,
    subscription_id,
    internal_user_id,
    event_type,
    mrr_cents,
    mrr_delta_cents,
    mrr_cents / 100.0                AS mrr_dollars,
    mrr_delta_cents / 100.0          AS mrr_delta_dollars,
    mrr_cents * 12 / 100.0           AS arr_dollars
FROM mrr_events
WHERE event_type != 'unchanged'

dbt Tests for Data Quality

# models/marts/finance/fct_revenue.yml
version: 2

models:
  - name: fct_revenue
    description: One row per MRR-changing event per subscription per month
    columns:
      - name: revenue_event_id
        tests:
          - unique
          - not_null

      - name: event_type
        tests:
          - accepted_values:
              values: ['new', 'expansion', 'contraction', 'churn', 'reactivation']

      - name: mrr_cents
        tests:
          - not_null
          # Custom test: MRR must be positive for active subs
          - dbt_utils.expression_is_true:
              expression: "mrr_cents > 0"

      - name: internal_user_id
        tests:
          - relationships:
              to: ref('stg_app__users')
              field: user_id
              # Allow nulls for Stripe customers without internal accounts
              config: { severity: warn }

Cost Optimization: Warehouse + dbt

-- Snowflake: partition your large models on a date column to reduce scan costs
-- dbt config block in your model SQL

{{ config(
    materialized='incremental',
    cluster_by=['month'],              -- Clustering key for scan pruning
    incremental_strategy='merge',
    unique_key='revenue_event_id',
    on_schema_change='fail'            -- Fail loudly on schema changes
) }}

-- Incremental logic: only process new data
{% if is_incremental() %}
WHERE period_start >= (SELECT MAX(month) FROM {{ this }}) - INTERVAL '2 months'
{% endif %}

⚡ Your Competitors Are Already Using AI — Are You?

We build AI systems that actually work in production — not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.

AI agent systems that run autonomously — not just chatbots
Integrates with your existing tools (CRM, ERP, Slack, etc.)
Explainable outputs — know why the model decided what it did
Free AI opportunity audit for your business

Get a Free AI Audit WhatsApp

Working With Viprasol

We build modern data stacks for SaaS companies — from warehouse selection and ELT pipeline setup through dbt model architecture and BI layer integration.

What we deliver:

Warehouse selection and setup (Snowflake/BigQuery/Redshift)
Fivetran or Airbyte pipeline configuration for Stripe, Salesforce, HubSpot
dbt project structure with staging/intermediate/marts layers
Revenue and SaaS metrics data models (MRR, NRR, cohort analysis)
dbt testing strategy and data quality monitoring

→ Discuss your data warehouse needs → AI and analytics services

SaaS Data Warehouse in 2026: dbt, Snowflake vs BigQuery vs Redshift, and ELT Pipelines

SaaS Data Warehouse in 2026: dbt, Snowflake vs BigQuery vs Redshift, and ELT Pipelines

Warehouse Comparison: Snowflake vs BigQuery vs Redshift

ELT Pipeline: Fivetran vs Airbyte

Fivetran: Connecting Stripe to Snowflake

Airbyte: Custom Source Connector

🤖 AI Is Not the Future — It Is Right Now

dbt Project Structure

Staging Model

Mart: Monthly Recurring Revenue

dbt Tests for Data Quality

Cost Optimization: Warehouse + dbt

⚡ Your Competitors Are Already Using AI — Are You?

Working With Viprasol

See Also

Viprasol Tech Team

Want to Implement AI in Your Business?

Ready to automate your business with AI agents?

Related Articles

Data Analytics Consulting: What It Covers, Costs, and How to Choose

Data Pipeline Architecture: Batch vs Streaming, Airflow vs Prefect, dbt, and Warehouse Design

Senior Business Intelligence Developer Salary: 2026 Guide