Back to Blog

SaaS Data Warehouse in 2026: dbt, Snowflake vs BigQuery vs Redshift, and ELT Pipelines

Build a SaaS data warehouse: dbt model architecture, Snowflake vs BigQuery vs Redshift comparison, Fivetran/Airbyte ELT pipelines, and data quality testing for analytics.

Viprasol Tech Team
August 19, 2026
14 min read

SaaS Data Warehouse in 2026: dbt, Snowflake vs BigQuery vs Redshift, and ELT Pipelines

Most SaaS companies outgrow Postgres as their analytics database somewhere between $1M and $5M ARR. The combination of complex join-heavy queries, growing data volume, and the need to blend data from multiple sources (Stripe, Salesforce, your app DB, Intercom) makes a dedicated data warehouse essential.

The modern data stack in 2026: Fivetran or Airbyte extracts from your sources and loads raw data into Snowflake, BigQuery, or Redshift. dbt transforms that raw data into clean, documented, tested analytics models. BI tools (Metabase, Looker, Superset) query the dbt models. This post covers each layer with production patterns.


Warehouse Comparison: Snowflake vs BigQuery vs Redshift

SnowflakeBigQueryRedshift Serverless
Pricing modelCredits per second (compute) + storagePer TB scanned (on-demand) or flat reservationPer RPU-hour + storage
ScalingInstant virtual warehouse resizeAutomatic (serverless)Automatic (serverless v2)
ConcurrencyMulti-cluster warehousesUnlimited slots (reservation)Managed automatically
Time travel90 days (Enterprise)7 daysN/A (snapshots)
Semi-structured dataExcellent (VARIANT type, FLATTEN)Excellent (REPEATED/RECORD types)Good (SUPER type)
dbt supportโœ… Best-in-classโœ… Excellentโœ… Good
GCP lock-inMulti-cloudGCP onlyAWS only
Cost at $10K/month data~$800/month~$500/month~$600/month
Cost at $100K/month data~$4,000/month~$3,000/month~$3,500/month

Recommendation by use case:

  • Already on AWS, existing Redshift: stick with Redshift Serverless
  • Already on GCP / BigQuery ecosystem: BigQuery
  • Multi-cloud or greenfield: Snowflake (best dbt integration, time travel, data sharing)

ELT Pipeline: Fivetran vs Airbyte

FivetranAirbyte
PricingPer Monthly Active Row (~$500โ€“$2K/month typical)Open source (free) or Airbyte Cloud
Connector qualityBest-in-class (Stripe, Salesforce, HubSpot)Good (600+ connectors)
MaintenanceZero โ€” Fivetran maintains schema changesYou maintain custom connectors
Custom sourcesPython SDKCustom connector framework
When to chooseWhen budget allows; zero infra overheadWhen cost matters more; technical team

Fivetran: Connecting Stripe to Snowflake

# fivetran_config.yaml (declarative connector config)
# Configured in Fivetran dashboard โ€” no YAML needed for standard connectors
# This represents the configuration options:

connector:
  service: stripe
  schema: stripe_raw
  destination: snowflake_prod
  sync_frequency: 360  # minutes (6 hours for most SaaS data)
  
  tables:
    - invoices
    - customers
    - subscriptions
    - charges
    - refunds
    - disputes
    - products
    - prices

# Result in Snowflake:
# stripe_raw.invoices, stripe_raw.customers, etc.
# All columns from Stripe API, including nested JSON expanded to columns

Airbyte: Custom Source Connector

# airbyte_connectors/source-internal-api/source.py
from airbyte_cdk.sources import AbstractSource
from airbyte_cdk.models import AirbyteStream, SyncMode
from airbyte_cdk.sources.streams.http import HttpStream
import requests

class UsersStream(HttpStream):
    url_base = "https://api.myapp.com/internal"
    primary_key = "id"

    def path(self, **kwargs) -> str:
        return "users"

    def next_page_token(self, response: requests.Response) -> Optional[Mapping]:
        data = response.json()
        if data["pagination"]["hasNextPage"]:
            return {"page": data["pagination"]["page"] + 1}
        return None

    def request_params(self, next_page_token=None, **kwargs) -> Mapping:
        params = {"perPage": 500}
        if next_page_token:
            params["page"] = next_page_token["page"]
        return params

    def parse_response(self, response: requests.Response, **kwargs):
        yield from response.json()["data"]

    def get_json_schema(self) -> Mapping:
        return {
            "type": "object",
            "properties": {
                "id":         {"type": "string"},
                "email":      {"type": "string"},
                "plan":       {"type": "string"},
                "createdAt":  {"type": "string", "format": "date-time"},
                "mrr":        {"type": "number"},
            },
        }

class SourceInternalAPI(AbstractSource):
    def check_connection(self, logger, config) -> Tuple[bool, Optional[Any]]:
        try:
            resp = requests.get(
                f"{config['api_url']}/healthz",
                headers={"Authorization": f"Bearer {config['api_key']}"},
                timeout=5,
            )
            return resp.ok, None
        except Exception as e:
            return False, str(e)

    def streams(self, config) -> List[Stream]:
        return [UsersStream(authenticator=...)]

๐Ÿค– AI Is Not the Future โ€” It Is Right Now

Businesses using AI automation cut manual work by 60โ€“80%. We build production-ready AI systems โ€” RAG pipelines, LLM integrations, custom ML models, and AI agent workflows.

  • LLM integration (OpenAI, Anthropic, Gemini, local models)
  • RAG systems that answer from your own data
  • AI agents that take real actions โ€” not just chat
  • Custom ML models for prediction, classification, detection

dbt Project Structure

models/
โ”œโ”€โ”€ staging/          โ† 1:1 with source tables; rename columns, cast types, no joins
โ”‚   โ”œโ”€โ”€ stripe/
โ”‚   โ”‚   โ”œโ”€โ”€ stg_stripe__customers.sql
โ”‚   โ”‚   โ”œโ”€โ”€ stg_stripe__invoices.sql
โ”‚   โ”‚   โ””โ”€โ”€ stg_stripe__subscriptions.sql
โ”‚   โ””โ”€โ”€ app/
โ”‚       โ”œโ”€โ”€ stg_app__users.sql
โ”‚       โ””โ”€โ”€ stg_app__orders.sql
โ”œโ”€โ”€ intermediate/     โ† Joins and business logic; not exposed to BI
โ”‚   โ”œโ”€โ”€ int_subscriptions_enriched.sql
โ”‚   โ””โ”€โ”€ int_revenue_events.sql
โ””โ”€โ”€ marts/            โ† Final models for BI; organized by business domain
    โ”œโ”€โ”€ finance/
    โ”‚   โ”œโ”€โ”€ fct_revenue.sql       โ† Fact table: one row per revenue event
    โ”‚   โ””โ”€โ”€ dim_customers.sql     โ† Dimension table: one row per customer
    โ””โ”€โ”€ product/
        โ”œโ”€โ”€ fct_events.sql
        โ””โ”€โ”€ fct_active_users.sql

Staging Model

-- models/staging/stripe/stg_stripe__subscriptions.sql
-- Rename, cast, and filter raw Stripe data. No business logic.

WITH source AS (
    SELECT * FROM {{ source('stripe_raw', 'subscriptions') }}
),

renamed AS (
    SELECT
        id                                          AS subscription_id,
        customer                                    AS stripe_customer_id,
        status,
        plan:id::TEXT                               AS plan_id,         -- Snowflake VARIANT
        plan:interval::TEXT                         AS billing_interval,
        plan:amount::INTEGER                        AS plan_amount_cents,
        quantity,
        CONVERT_TIMEZONE('UTC', created)            AS created_at,
        CONVERT_TIMEZONE('UTC', current_period_start) AS period_start,
        CONVERT_TIMEZONE('UTC', current_period_end)   AS period_end,
        CONVERT_TIMEZONE('UTC', canceled_at)          AS cancelled_at,
        cancel_at_period_end,
        metadata:internal_user_id::TEXT             AS internal_user_id
    FROM source
    WHERE _fivetran_deleted = FALSE   -- Fivetran soft-delete column
)

SELECT * FROM renamed

Mart: Monthly Recurring Revenue

-- models/marts/finance/fct_revenue.sql
-- One row per MRR event (new, expansion, contraction, churn)

WITH subscriptions AS (
    SELECT * FROM {{ ref('stg_stripe__subscriptions') }}
),

monthly_mrr AS (
    SELECT
        DATE_TRUNC('month', period_start) AS month,
        subscription_id,
        stripe_customer_id,
        internal_user_id,
        -- Normalize to monthly MRR
        CASE billing_interval
            WHEN 'month' THEN plan_amount_cents * quantity
            WHEN 'year'  THEN plan_amount_cents * quantity / 12
            ELSE plan_amount_cents * quantity
        END AS mrr_cents,
        status,
        LAG(mrr_cents) OVER (
            PARTITION BY subscription_id ORDER BY period_start
        ) AS prev_mrr_cents,
        LAG(status) OVER (
            PARTITION BY subscription_id ORDER BY period_start
        ) AS prev_status
    FROM subscriptions
),

mrr_events AS (
    SELECT
        month,
        subscription_id,
        internal_user_id,
        mrr_cents,
        prev_mrr_cents,
        CASE
            WHEN prev_status IS NULL AND status = 'active'
                THEN 'new'
            WHEN prev_status = 'active' AND status IN ('canceled', 'unpaid')
                THEN 'churn'
            WHEN prev_status IN ('canceled', 'unpaid') AND status = 'active'
                THEN 'reactivation'
            WHEN mrr_cents > COALESCE(prev_mrr_cents, 0)
                THEN 'expansion'
            WHEN mrr_cents < COALESCE(prev_mrr_cents, 0)
                THEN 'contraction'
            ELSE 'unchanged'
        END AS event_type,
        mrr_cents - COALESCE(prev_mrr_cents, 0) AS mrr_delta_cents
    FROM monthly_mrr
    WHERE status != 'trialing'  -- Trials don't count as revenue
)

SELECT
    {{ dbt_utils.generate_surrogate_key(['month', 'subscription_id']) }} AS revenue_event_id,
    month,
    subscription_id,
    internal_user_id,
    event_type,
    mrr_cents,
    mrr_delta_cents,
    mrr_cents / 100.0                AS mrr_dollars,
    mrr_delta_cents / 100.0          AS mrr_delta_dollars,
    mrr_cents * 12 / 100.0           AS arr_dollars
FROM mrr_events
WHERE event_type != 'unchanged'

dbt Tests for Data Quality

# models/marts/finance/fct_revenue.yml
version: 2

models:
  - name: fct_revenue
    description: One row per MRR-changing event per subscription per month
    columns:
      - name: revenue_event_id
        tests:
          - unique
          - not_null

      - name: event_type
        tests:
          - accepted_values:
              values: ['new', 'expansion', 'contraction', 'churn', 'reactivation']

      - name: mrr_cents
        tests:
          - not_null
          # Custom test: MRR must be positive for active subs
          - dbt_utils.expression_is_true:
              expression: "mrr_cents > 0"

      - name: internal_user_id
        tests:
          - relationships:
              to: ref('stg_app__users')
              field: user_id
              # Allow nulls for Stripe customers without internal accounts
              config: { severity: warn }

Cost Optimization: Warehouse + dbt

-- Snowflake: partition your large models on a date column to reduce scan costs
-- dbt config block in your model SQL

{{ config(
    materialized='incremental',
    cluster_by=['month'],              -- Clustering key for scan pruning
    incremental_strategy='merge',
    unique_key='revenue_event_id',
    on_schema_change='fail'            -- Fail loudly on schema changes
) }}

-- Incremental logic: only process new data
{% if is_incremental() %}
WHERE period_start >= (SELECT MAX(month) FROM {{ this }}) - INTERVAL '2 months'
{% endif %}

โšก Your Competitors Are Already Using AI โ€” Are You?

We build AI systems that actually work in production โ€” not demos that die in a Colab notebook. From data pipeline to deployed model to real business outcomes.

  • AI agent systems that run autonomously โ€” not just chatbots
  • Integrates with your existing tools (CRM, ERP, Slack, etc.)
  • Explainable outputs โ€” know why the model decided what it did
  • Free AI opportunity audit for your business

Working With Viprasol

We build modern data stacks for SaaS companies โ€” from warehouse selection and ELT pipeline setup through dbt model architecture and BI layer integration.

What we deliver:

  • Warehouse selection and setup (Snowflake/BigQuery/Redshift)
  • Fivetran or Airbyte pipeline configuration for Stripe, Salesforce, HubSpot
  • dbt project structure with staging/intermediate/marts layers
  • Revenue and SaaS metrics data models (MRR, NRR, cohort analysis)
  • dbt testing strategy and data quality monitoring

โ†’ Discuss your data warehouse needs โ†’ AI and analytics services


See Also

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Want to Implement AI in Your Business?

From chatbots to predictive models โ€” harness the power of AI with a team that delivers.

Free consultation โ€ข No commitment โ€ข Response within 24 hours

Viprasol ยท AI Agent Systems

Ready to automate your business with AI agents?

We build custom multi-agent AI systems that handle sales, support, ops, and content โ€” across Telegram, WhatsApp, Slack, and 20+ other platforms. We run our own business on these systems.