PostgreSQL CTEs and Recursive Queries: Tree Traversal, Hierarchical Data, and WITH Patterns
Master PostgreSQL Common Table Expressions (CTEs) and recursive queries. Covers WITH clauses, recursive tree traversal, organizational charts, bill of materials, path enumeration, and query optimization.
Hierarchical data is everywhere in SaaS products: comment threads, folder structures, organizational charts, product categories, task dependencies, bill of materials. PostgreSQL's WITH RECURSIVE lets you traverse these structures in pure SQL — no application-level loops, no multiple round-trips, no graph database required.
This guide covers CTE fundamentals, recursive query mechanics, the three patterns for hierarchical data storage, and the performance pitfalls to avoid.
CTE Fundamentals
A Common Table Expression (CTE) is a named subquery scoped to a single statement. Think of it as a temporary view that only exists during query execution.
-- Basic CTE: named subquery
WITH active_workspaces AS (
SELECT id, name, plan
FROM workspaces
WHERE status = 'active'
AND created_at > NOW() - INTERVAL '90 days'
),
workspace_member_counts AS (
SELECT workspace_id, COUNT(*) AS member_count
FROM workspace_members
GROUP BY workspace_id
)
SELECT
aw.name,
aw.plan,
COALESCE(wmc.member_count, 0) AS members
FROM active_workspaces aw
LEFT JOIN workspace_member_counts wmc ON wmc.workspace_id = aw.id
ORDER BY members DESC;
When to use CTEs vs subqueries:
- Use CTEs when you reference the same subquery multiple times
- Use CTEs to break complex queries into readable named steps
- Use subqueries when PostgreSQL's optimizer needs to push predicates down (CTEs can be optimization fences in older PostgreSQL — this changed in PG 12+)
- Use
WITH MATERIALIZEDto force CTE materialization when you want it evaluated once
-- Force materialization (evaluated once, not inlined)
WITH MATERIALIZED expensive_calculation AS (
SELECT user_id, complex_aggregate_function(data)
FROM large_table
GROUP BY user_id
)
SELECT * FROM expensive_calculation WHERE user_id = $1;
-- Allow inlining (PostgreSQL 12+ default for non-recursive)
WITH NOT MATERIALIZED recent_users AS (
SELECT * FROM users WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_users WHERE status = 'active';
Hierarchical Data Storage Patterns
Pattern 1: Adjacency List (Parent ID)
The simplest approach: each row stores its parent's ID.
CREATE TABLE categories (
id BIGSERIAL PRIMARY KEY,
parent_id BIGINT REFERENCES categories(id) ON DELETE CASCADE,
name TEXT NOT NULL,
slug TEXT NOT NULL UNIQUE,
sort_order INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_categories_parent ON categories(parent_id);
-- Sample data
INSERT INTO categories (id, parent_id, name, slug) VALUES
(1, NULL, 'Electronics', 'electronics'),
(2, 1, 'Computers', 'computers'),
(3, 1, 'Mobile Phones', 'mobile-phones'),
(4, 2, 'Laptops', 'laptops'),
(5, 2, 'Desktops', 'desktops'),
(6, 3, 'Smartphones', 'smartphones'),
(7, 3, 'Feature Phones', 'feature-phones'),
(8, 4, 'Gaming Laptops', 'gaming-laptops'),
(9, 4, 'Business Laptops', 'business-laptops');
Pros: Simple to understand, easy inserts/updates. Cons: Querying the full tree requires recursive SQL.
Pattern 2: Materialized Path (ltree)
Store the full path as a string: electronics.computers.laptops.
CREATE EXTENSION ltree;
CREATE TABLE categories_ltree (
id BIGSERIAL PRIMARY KEY,
path LTREE NOT NULL,
name TEXT NOT NULL
);
CREATE INDEX idx_categories_path_gist ON categories_ltree USING GIST(path);
CREATE INDEX idx_categories_path_btree ON categories_ltree USING BTREE(path);
INSERT INTO categories_ltree (path, name) VALUES
('electronics', 'Electronics'),
('electronics.computers', 'Computers'),
('electronics.computers.laptops', 'Laptops'),
('electronics.computers.laptops.gaming', 'Gaming Laptops');
-- Get all descendants of 'computers' — no recursion needed!
SELECT * FROM categories_ltree
WHERE path <@ 'electronics.computers';
-- Get direct children only
SELECT * FROM categories_ltree
WHERE path ~ 'electronics.computers.*{1}';
-- Get path from root to a node
SELECT * FROM categories_ltree
WHERE 'electronics.computers.laptops' @> path;
-- Get depth
SELECT nlevel(path) AS depth, name FROM categories_ltree;
Pros: Fast subtree queries without recursion, depth is trivial. Cons: Moving a subtree requires updating all descendant paths.
Pattern 3: Nested Sets (Modified Preorder Tree Traversal)
CREATE TABLE categories_nested (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
lft INTEGER NOT NULL,
rgt INTEGER NOT NULL,
depth INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX idx_cat_nested_lft ON categories_nested(lft);
CREATE INDEX idx_cat_nested_rgt ON categories_nested(rgt);
-- Electronics: lft=1, rgt=18
-- Computers: lft=2, rgt=11 (children: lft 3–10)
-- A node and all its descendants: WHERE lft BETWEEN parent.lft AND parent.rgt
-- Get all descendants of Computers (lft=2, rgt=11)
SELECT * FROM categories_nested
WHERE lft BETWEEN 2 AND 11
ORDER BY lft;
-- Get ancestors (breadcrumb)
SELECT * FROM categories_nested n
JOIN categories_nested node ON node.lft BETWEEN n.lft AND n.rgt
WHERE node.id = 8 -- Gaming Laptops
ORDER BY n.lft;
Pros: Ultra-fast subtree and ancestor queries. Cons: Inserts are expensive (requires renumbering). Best for read-heavy, rarely-modified trees.
☁️ Is Your Cloud Costing Too Much?
Most teams overspend 30–40% on cloud — wrong instance types, no reserved pricing, bloated storage. We audit, right-size, and automate your infrastructure.
- AWS, GCP, Azure certified engineers
- Infrastructure as Code (Terraform, CDK)
- Docker, Kubernetes, GitHub Actions CI/CD
- Typical audit recovers $500–$3,000/month in savings
WITH RECURSIVE: Mechanics
-- Anatomy of a recursive CTE
WITH RECURSIVE cte_name AS (
-- 1. Anchor member: starting rows (non-recursive)
SELECT ...
UNION ALL -- (or UNION to deduplicate)
-- 2. Recursive member: references cte_name
-- Executes repeatedly until no new rows are added
SELECT ... FROM cte_name JOIN ...
)
SELECT * FROM cte_name;
How PostgreSQL executes this:
- Evaluates the anchor member, stores results in a "working table"
- Evaluates the recursive member using the working table
- Appends results to the working table, replaces working table with new results
- Repeats until the recursive member returns zero rows
- Returns all accumulated results
Traversing the Adjacency List Tree
Get All Descendants
-- All descendants of category 'Computers' (id=2)
WITH RECURSIVE descendants AS (
-- Anchor: start at the target node
SELECT id, parent_id, name, slug, 0 AS depth
FROM categories
WHERE id = 2
UNION ALL
-- Recursive: find children of current level
SELECT c.id, c.parent_id, c.name, c.slug, d.depth + 1
FROM categories c
INNER JOIN descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants ORDER BY depth, name;
-- Result:
-- id=2 Computers (depth 0)
-- id=4 Laptops (depth 1)
-- id=5 Desktops (depth 1)
-- id=8 Gaming Laptops (depth 2)
-- id=9 Business Lapt. (depth 2)
Get All Ancestors (Breadcrumb Path)
-- Full path from root to 'Gaming Laptops' (id=8)
WITH RECURSIVE ancestors AS (
-- Anchor: start at the target node
SELECT id, parent_id, name, 0 AS level
FROM categories
WHERE id = 8
UNION ALL
-- Recursive: go UP to parent
SELECT c.id, c.parent_id, c.name, a.level + 1
FROM categories c
INNER JOIN ancestors a ON c.id = a.parent_id
)
SELECT * FROM ancestors ORDER BY level DESC;
-- Result (reversed for breadcrumb display):
-- Electronics → Computers → Laptops → Gaming Laptops
Build Full Path String
WITH RECURSIVE paths AS (
SELECT id, parent_id, name,
name AS path,
ARRAY[id] AS id_path
FROM categories
WHERE parent_id IS NULL -- Start at roots
UNION ALL
SELECT c.id, c.parent_id, c.name,
p.path || ' > ' || c.name,
p.id_path || c.id
FROM categories c
INNER JOIN paths p ON c.parent_id = p.id
)
SELECT id, name, path, id_path
FROM paths
ORDER BY path;
-- Result:
-- Electronics
-- Electronics > Computers
-- Electronics > Computers > Laptops
-- Electronics > Computers > Laptops > Gaming Laptops
-- etc.
⚙️ DevOps Done Right — Zero Downtime, Full Automation
Ship faster without breaking things. We build CI/CD pipelines, monitoring stacks, and auto-scaling infrastructure that your team can actually maintain.
- Staging + production environments with feature flags
- Automated security scanning in the pipeline
- Uptime monitoring + alerting + runbook automation
- On-call support handover docs included
Real-World Examples
Organizational Chart
CREATE TABLE employees (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
title TEXT NOT NULL,
manager_id BIGINT REFERENCES employees(id),
department TEXT,
salary NUMERIC(10,2)
);
-- Everyone who reports to CEO (id=1), at any depth
WITH RECURSIVE org_chart AS (
SELECT id, name, title, manager_id, department, 0 AS depth,
ARRAY[id] AS path
FROM employees
WHERE id = 1
UNION ALL
SELECT e.id, e.name, e.title, e.manager_id, e.department,
oc.depth + 1,
oc.path || e.id
FROM employees e
INNER JOIN org_chart oc ON e.manager_id = oc.id
WHERE NOT e.id = ANY(oc.path) -- Cycle detection!
)
SELECT
repeat(' ', depth) || name AS indented_name,
title,
department,
depth
FROM org_chart
ORDER BY path;
-- Total headcount and salary budget per manager
WITH RECURSIVE reports AS (
SELECT id, name, manager_id, salary, id AS root_manager_id
FROM employees
UNION ALL
SELECT e.id, e.name, e.manager_id, e.salary, r.root_manager_id
FROM employees e
INNER JOIN reports r ON e.manager_id = r.id
)
SELECT
root_manager_id,
m.name AS manager_name,
COUNT(*) AS total_reports,
SUM(salary) AS total_salary_budget
FROM reports r
JOIN employees m ON m.id = r.root_manager_id
GROUP BY root_manager_id, m.name
ORDER BY total_reports DESC;
Comment Threads (Reddit-Style)
CREATE TABLE comments (
id BIGSERIAL PRIMARY KEY,
post_id BIGINT NOT NULL,
parent_id BIGINT REFERENCES comments(id),
author_id BIGINT NOT NULL,
body TEXT NOT NULL,
upvotes INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Fetch entire comment tree for a post, ordered by thread
WITH RECURSIVE comment_tree AS (
-- Top-level comments
SELECT id, parent_id, author_id, body, upvotes, created_at,
0 AS depth,
LPAD(id::text, 10, '0') AS sort_path
FROM comments
WHERE post_id = $1 AND parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, c.author_id, c.body, c.upvotes, c.created_at,
ct.depth + 1,
ct.sort_path || '/' || LPAD(c.id::text, 10, '0')
FROM comments c
INNER JOIN comment_tree ct ON c.parent_id = ct.id
WHERE ct.depth < 10 -- Limit nesting depth
)
SELECT
id,
depth,
repeat(' ', depth) || body AS indented_body,
upvotes,
created_at
FROM comment_tree
ORDER BY sort_path; -- Sort by thread path preserves reply order
Bill of Materials (Parts Explosion)
CREATE TABLE parts (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
part_number TEXT NOT NULL UNIQUE,
unit_cost NUMERIC(10,4) NOT NULL
);
CREATE TABLE bom (
parent_part_id BIGINT NOT NULL REFERENCES parts(id),
child_part_id BIGINT NOT NULL REFERENCES parts(id),
quantity NUMERIC(10,3) NOT NULL,
PRIMARY KEY (parent_part_id, child_part_id)
);
-- Total cost explosion for a product (includes all sub-components)
WITH RECURSIVE cost_explosion AS (
-- Anchor: the top-level product
SELECT b.child_part_id, b.quantity, p.name, p.unit_cost,
b.quantity AS cumulative_quantity
FROM bom b
JOIN parts p ON p.id = b.child_part_id
WHERE b.parent_part_id = $1 -- product id
UNION ALL
-- Recursive: sub-components
SELECT b.child_part_id, b.quantity, p.name, p.unit_cost,
ce.cumulative_quantity * b.quantity
FROM bom b
JOIN parts p ON p.id = b.child_part_id
JOIN cost_explosion ce ON b.parent_part_id = ce.child_part_id
)
SELECT
name,
SUM(cumulative_quantity) AS total_quantity,
unit_cost,
SUM(cumulative_quantity * unit_cost) AS total_cost
FROM cost_explosion
GROUP BY child_part_id, name, unit_cost
ORDER BY total_cost DESC;
Task Dependency Graph (Critical Path)
CREATE TABLE tasks (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
duration INTEGER NOT NULL -- days
);
CREATE TABLE task_dependencies (
task_id BIGINT NOT NULL REFERENCES tasks(id),
depends_on BIGINT NOT NULL REFERENCES tasks(id),
PRIMARY KEY (task_id, depends_on)
);
-- Find longest path (critical path) through dependency graph
WITH RECURSIVE critical_path AS (
-- Anchor: tasks with no dependencies (start nodes)
SELECT t.id, t.name, t.duration, t.duration AS earliest_finish,
ARRAY[t.id] AS path, FALSE AS has_cycle
FROM tasks t
WHERE NOT EXISTS (
SELECT 1 FROM task_dependencies WHERE task_id = t.id
)
UNION ALL
SELECT t.id, t.name, t.duration,
cp.earliest_finish + t.duration,
cp.path || t.id,
t.id = ANY(cp.path) -- Cycle detection
FROM tasks t
JOIN task_dependencies td ON td.task_id = t.id
JOIN critical_path cp ON cp.id = td.depends_on
WHERE NOT cp.has_cycle
)
SELECT
path,
name AS final_task,
earliest_finish AS total_duration_days,
array_length(path, 1) AS task_count
FROM critical_path
ORDER BY earliest_finish DESC
LIMIT 1; -- The critical path is the longest
Performance Considerations
-- Prevent infinite loops with depth limit
WITH RECURSIVE tree AS (
SELECT id, parent_id, 0 AS depth
FROM nodes WHERE id = $1
UNION ALL
SELECT n.id, n.parent_id, t.depth + 1
FROM nodes n
JOIN tree t ON n.parent_id = t.id
WHERE t.depth < 20 -- Hard limit prevents runaway recursion
)
SELECT * FROM tree;
-- Cycle detection with path array
WITH RECURSIVE tree AS (
SELECT id, parent_id, ARRAY[id] AS visited
FROM nodes WHERE id = $1
UNION ALL
SELECT n.id, n.parent_id, t.visited || n.id
FROM nodes n
JOIN tree t ON n.parent_id = t.id
WHERE NOT n.id = ANY(t.visited) -- Stop if cycle detected
)
SELECT * FROM tree;
-- Check for index usage with EXPLAIN
EXPLAIN (ANALYZE, BUFFERS)
WITH RECURSIVE ...
-- Critical index: parent_id must be indexed for recursive joins to be fast
CREATE INDEX idx_nodes_parent_id ON nodes(parent_id);
-- Without this index, each recursive step does a seq scan
When to Use Each Pattern
| Scenario | Best Pattern | Reason |
|---|---|---|
| Unlimited nesting, reads dominant | ltree | O(1) subtree queries, no recursion |
| Nesting rarely changes, fast reads | Nested sets | Subtree = range scan |
| Frequent moves/inserts | Adjacency list + WITH RECURSIVE | Simple updates, any-depth traversal |
| Depth ≤ 5, simple | Adjacency list | Minimal complexity |
| Graph (not tree — multi-parent) | Adjacency list | Only pattern supporting DAGs |
| Need breadcrumbs frequently | ltree or path-building CTE | Path is stored/computed once |
Cost and Timeline Estimates
| Scope | Team | Timeline | Cost Range |
|---|---|---|---|
| Adjacency list schema + basic CTE queries | 1 dev | 0.5–1 day | $200–500 |
| Full hierarchical system (ltree + UI) | 1 dev | 2–3 days | $600–1,200 |
| Complex graph traversal (BOM, critical path) | 1–2 devs | 1 week | $2,000–4,000 |
| Migration from flat to hierarchical structure | 1–2 devs | 1–2 weeks | $3,000–6,000 |
See Also
- PostgreSQL Window Functions for Analytics
- PostgreSQL Full-Text Search
- PostgreSQL Performance Tuning: EXPLAIN ANALYZE and Indexes
- PostgreSQL Materialized Views for Analytics
- PostgreSQL Partitioning for Large Tables
Working With Viprasol
Hierarchical data modelling is one of those database decisions that's easy to get wrong and expensive to migrate later. Our team has designed and queried tree structures in PostgreSQL for org charts, e-commerce categories, multi-level comment systems, and supply chain bill-of-materials — choosing the right pattern for each access profile.
What we deliver:
- Schema design for your specific hierarchy use case
- WITH RECURSIVE queries optimized with proper indexing
- ltree migration for read-heavy trees
- Cycle detection and depth-limiting for production safety
- Query performance benchmarks across approaches
Talk to our team about your data modelling challenge →
Or explore our cloud infrastructure services.
About the Author
Viprasol Tech Team
Custom Software Development Specialists
The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.
Need DevOps & Cloud Expertise?
Scale your infrastructure with confidence. AWS, GCP, Azure certified team.
Free consultation • No commitment • Response within 24 hours
Making sense of your data at scale?
Viprasol builds end-to-end big data analytics solutions — ETL pipelines, data warehouses on Snowflake or BigQuery, and self-service BI dashboards. One reliable source of truth for your entire organisation.