Background jobs are where most production reliability problems hide. The HTTP request returns 200, the job is queued, and somewhere in the background it silently fails — no retry, no alerting, no dead letter queue. The user never finds out their export wasn't generated or their email wasn't sent.

BullMQ on Redis solves this with persistent queues, configurable retries, and first-class dead letter queue (DLQ) support. The architecture question isn't whether to use a job queue — it's how to configure it correctly for your failure modes.

Queue Architecture Overview

Producer (API server)
  │
  ├── emailQueue        → email-worker (concurrency: 10)
  ├── exportQueue       → export-worker (concurrency: 3, heavy CPU)
  ├── notificationQueue → notification-worker (concurrency: 20)
  └── webhookQueue      → webhook-worker (concurrency: 5)

Failed jobs (after maxAttempts) →  DLQ (dead letter queue)
  └── Manual review / reprocessing

Queue and Worker Setup

// src/queues/index.ts
import { Queue, QueueEvents } from "bullmq";
import { Redis } from "ioredis";

// Separate Redis connection for each queue component
// BullMQ requires dedicated connections (no multiplexing)
const connection = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null, // Required by BullMQ
  enableReadyCheck: false,
});

// Shared queue options
const defaultQueueOptions = {
  connection,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: "exponential" as const,
      delay: 1000, // Start at 1s, then 2s, 4s
    },
    removeOnComplete: {
      age: 24 * 3600,   // Keep completed jobs for 24 hours
      count: 1000,       // Keep last 1000 completed jobs
    },
    removeOnFail: false, // Keep failed jobs for DLQ processing
  },
};

// Define queue payloads as discriminated union for type safety
export interface EmailJobData {
  type: "welcome" | "password-reset" | "invoice" | "notification";
  to: string;
  userId: string;
  templateData: Record<string, unknown>;
}

export interface ExportJobData {
  type: "csv" | "pdf" | "xlsx";
  userId: string;
  tenantId: string;
  filters: Record<string, unknown>;
  notifyEmail: string;
}

export interface WebhookJobData {
  url: string;
  payload: unknown;
  secret: string;
  tenantId: string;
  eventType: string;
  attemptNumber?: number;
}

// Create queues
export const emailQueue = new Queue<EmailJobData>("email", defaultQueueOptions);
export const exportQueue = new Queue<ExportJobData>("export", {
  ...defaultQueueOptions,
  defaultJobOptions: {
    ...defaultQueueOptions.defaultJobOptions,
    attempts: 2,          // Exports are expensive — limit retries
    timeout: 5 * 60_000,  // 5-minute timeout
  },
});
export const webhookQueue = new Queue<WebhookJobData>("webhook", {
  ...defaultQueueOptions,
  defaultJobOptions: {
    ...defaultQueueOptions.defaultJobOptions,
    attempts: 5,          // Webhooks: retry more aggressively
    backoff: {
      type: "exponential",
      delay: 5000,        // Start at 5s for webhooks
    },
  },
});

// Dead letter queue — receives jobs that exhausted all retries
export const dlq = new Queue("dead-letter", { connection });

🌐 Looking for a Dev Team That Actually Delivers?

Most agencies sell you a project manager and assign juniors. Viprasol is different — senior engineers only, direct Slack access, and a 5.0★ Upwork record across 100+ projects.

React, Next.js, Node.js, TypeScript — production-grade stack
Fixed-price contracts — no surprise invoices
Full source code ownership from day one
90-day post-launch support included

Get a Free Scope Review WhatsApp

Worker Implementation

// src/workers/email.worker.ts
import { Worker, Job, UnrecoverableError } from "bullmq";
import { Redis } from "ioredis";
import { EmailJobData, dlq } from "../queues";

const connection = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null,
});

export const emailWorker = new Worker<EmailJobData>(
  "email",
  async (job: Job<EmailJobData>) => {
    const { type, to, userId, templateData } = job.data;

    // Update progress for long-running jobs
    await job.updateProgress(10);

    try {
      // Route to appropriate email handler
      switch (type) {
        case "welcome":
          await sendWelcomeEmail(to, templateData);
          break;
        case "invoice":
          await sendInvoiceEmail(to, templateData);
          break;
        case "password-reset":
          await sendPasswordResetEmail(to, templateData);
          break;
        default:
          throw new UnrecoverableError(`Unknown email type: ${type}`);
          // UnrecoverableError: BullMQ will NOT retry — goes straight to failed
      }

      await job.updateProgress(100);

      // Return value is stored with completed job
      return { sentAt: new Date().toISOString(), provider: "resend" };
    } catch (error) {
      // Classify errors: recoverable vs unrecoverable
      if (error instanceof EmailAddressInvalidError) {
        // Don't retry invalid addresses — burn through retries
        throw new UnrecoverableError(
          `Invalid email address: ${to}. ${error.message}`
        );
      }

      // All other errors: let BullMQ retry according to backoff config
      throw error;
    }
  },
  {
    connection,
    concurrency: 10, // Process 10 emails simultaneously per worker process

    // Rate limiting: max 100 emails per 10 seconds (Resend free tier)
    limiter: {
      max: 100,
      duration: 10_000,
    },
  }
);

// Handle job lifecycle events
emailWorker.on("completed", (job, result) => {
  console.log(`[email] Job ${job.id} completed:`, result);
});

emailWorker.on("failed", async (job, error) => {
  if (!job) return;

  console.error(`[email] Job ${job.id} failed (attempt ${job.attemptsMade}):`, error.message);

  // Move to DLQ after all retries exhausted
  if (job.attemptsMade >= (job.opts.attempts ?? 3)) {
    await dlq.add(
      "failed-email",
      {
        originalQueue: "email",
        jobId: job.id,
        jobData: job.data,
        error: error.message,
        failedAt: new Date().toISOString(),
        attemptsMade: job.attemptsMade,
      },
      { removeOnComplete: false }
    );
  }
});

emailWorker.on("error", (error) => {
  console.error("[email] Worker error:", error);
});

Dead Letter Queue Processing

// src/workers/dlq.worker.ts
// Process the DLQ: alert, log to database, enable manual retry

import { Worker, Job } from "bullmq";

interface DLQJobData {
  originalQueue: string;
  jobId: string | undefined;
  jobData: unknown;
  error: string;
  failedAt: string;
  attemptsMade: number;
}

const dlqWorker = new Worker<DLQJobData>(
  "dead-letter",
  async (job: Job<DLQJobData>) => {
    const { originalQueue, jobId, jobData, error, failedAt, attemptsMade } =
      job.data;

    // 1. Persist to database for audit trail and manual review
    await db.query(
      `INSERT INTO failed_jobs
       (original_queue, original_job_id, job_data, error_message, failed_at, attempts_made)
       VALUES ($1, $2, $3, $4, $5, $6)`,
      [
        originalQueue,
        jobId,
        JSON.stringify(jobData),
        error,
        failedAt,
        attemptsMade,
      ]
    );

    // 2. Alert on-call if the failure rate is high
    const recentFailures = await db.query<{ count: string }>(
      `SELECT COUNT(*)::text FROM failed_jobs
       WHERE original_queue = $1
         AND failed_at > NOW() - INTERVAL '1 hour'`,
      [originalQueue]
    );

    const failureCount = parseInt(recentFailures.rows[0].count);
    if (failureCount >= 10) {
      await alertOncall({
        severity: "warning",
        message: `${failureCount} jobs failed in ${originalQueue} queue in the last hour`,
        queue: originalQueue,
        runbook: "https://runbooks.internal/job-queue-failures",
      });
    }

    // 3. Notify affected user (where applicable)
    if (originalQueue === "export") {
      const exportData = jobData as { notifyEmail: string; type: string };
      await sendSystemEmail(exportData.notifyEmail, "export-failed", {
        exportType: exportData.type,
        supportLink: "https://viprasol.com/contact",
      });
    }
  },
  { connection, concurrency: 1 } // Process DLQ items one at a time
);

🚀 Senior Engineers. No Junior Handoffs. Ever.

You get the senior developer, not a project manager who relays your requirements to someone you never meet. Every Viprasol project has a senior lead from kickoff to launch.

MVPs in 4–8 weeks, full platforms in 3–5 months
Lighthouse 90+ performance scores standard
Works across US, UK, AU timezones
Free 30-min architecture review, no commitment

Start My Project WhatsApp

Job Producers: Enqueueing Jobs

// src/services/email.service.ts
import { emailQueue } from "../queues";

export class EmailService {
  // Simple enqueue
  async sendWelcomeEmail(userId: string, email: string): Promise<void> {
    await emailQueue.add(
      "welcome",
      {
        type: "welcome",
        to: email,
        userId,
        templateData: { signupDate: new Date().toISOString() },
      },
      {
        // Job-level overrides
        jobId: `welcome-${userId}`, // Idempotent: won't add duplicate
        delay: 5 * 60_000,          // Send 5 minutes after signup
      }
    );
  }

  // Bulk enqueue with rate limiting
  async sendBulkNotifications(
    userIds: string[],
    message: string
  ): Promise<void> {
    const jobs = userIds.map((userId, index) => ({
      name: "notification",
      data: {
        type: "notification" as const,
        to: `user-${userId}@example.com`,
        userId,
        templateData: { message },
      },
      opts: {
        delay: index * 100, // Stagger by 100ms to avoid thundering herd
        priority: 10,       // Lower priority than transactional emails
      },
    }));

    await emailQueue.addBulk(jobs);
  }

  // Scheduled/recurring jobs
  async scheduleMonthlyReport(tenantId: string): Promise<void> {
    await emailQueue.add(
      "monthly-report",
      {
        type: "notification",
        to: "admin@tenant.com",
        userId: "system",
        templateData: { tenantId, reportMonth: new Date().toISOString() },
      },
      {
        repeat: {
          pattern: "0 8 1 * *", // 8am on the 1st of every month (cron)
          tz: "America/New_York",
        },
        jobId: `monthly-report-${tenantId}`, // Stable ID for the repeated job
      }
    );
  }
}

Queue Monitoring

// src/monitoring/queue-metrics.ts
import { QueueEvents, Queue } from "bullmq";
import { Gauge, Counter, register } from "prom-client";

const queueDepth = new Gauge({
  name: "bullmq_queue_depth",
  help: "Number of jobs waiting in queue",
  labelNames: ["queue", "status"],
  registers: [register],
});

const jobsProcessed = new Counter({
  name: "bullmq_jobs_processed_total",
  help: "Total jobs processed",
  labelNames: ["queue", "outcome"],
  registers: [register],
});

export async function collectQueueMetrics(queues: Queue[]) {
  for (const queue of queues) {
    const counts = await queue.getJobCounts(
      "waiting",
      "active",
      "delayed",
      "failed",
      "completed"
    );

    queueDepth.set({ queue: queue.name, status: "waiting" }, counts.waiting);
    queueDepth.set({ queue: queue.name, status: "active" }, counts.active);
    queueDepth.set({ queue: queue.name, status: "delayed" }, counts.delayed);
    queueDepth.set({ queue: queue.name, status: "failed" }, counts.failed);

    // Alert if DLQ is growing
    if (queue.name === "dead-letter" && counts.waiting > 50) {
      console.error(`DLQ depth: ${counts.waiting} — investigate failed jobs`);
    }
  }
}

// BullMQ Board — web UI for queue inspection (development + staging)
// npm install @bull-board/fastify @bull-board/api
import { createBullBoard } from "@bull-board/api";
import { BullMQAdapter } from "@bull-board/api/bullMQAdapter";
import { FastifyAdapter } from "@bull-board/fastify";

export function setupBullBoard(
  app: FastifyInstance,
  queues: Queue[]
): void {
  const serverAdapter = new FastifyAdapter();
  serverAdapter.setBasePath("/admin/queues");

  createBullBoard({
    queues: queues.map((q) => new BullMQAdapter(q)),
    serverAdapter,
  });

  app.register(serverAdapter.registerPlugin(), {
    prefix: "/admin/queues",
    basePath: "/admin/queues",
  });
}

Concurrency Tuning Guide

Worker Type	Concurrency	Reasoning
Email sending	10–20	I/O-bound, limited by provider rate limits
PDF/report generation	2–4	CPU-bound, memory-intensive
Webhook delivery	5–10	I/O-bound, external latency
Database-heavy jobs	3–5	Limited by connection pool
External API calls	10–20	I/O-bound, watch API rate limits
Image processing	2–3	CPU + memory intensive

Rule: concurrency × workers × avg_job_duration_ms / 1000 = sustained throughput in jobs/sec

Working With Viprasol

Background job reliability is non-negotiable for SaaS products where users expect async operations (exports, report generation, email sequences) to complete correctly. We design job queue architectures with proper retry policies, dead letter queues, concurrency limits, and monitoring so failures are caught and resolved before users notice.

Backend engineering → | Talk to our engineers →

Background Jobs Architecture: BullMQ, Retries, Dead Letter Queues, and Concurrency

Queue Architecture Overview

Queue and Worker Setup

🌐 Looking for a Dev Team That Actually Delivers?

Worker Implementation

Dead Letter Queue Processing

🚀 Senior Engineers. No Junior Handoffs. Ever.

Job Producers: Enqueueing Jobs

Queue Monitoring

Concurrency Tuning Guide

See Also

Working With Viprasol

Viprasol Tech Team

Need a Modern Web Application?

Need a custom web application built?

Related Articles

Node.js Streams and Backpressure: Readable, Writable, Transform, and the Pipeline API

AWS SQS Dead Letter Queue in 2026: Poison Pills, Redrive Policy, and Failure Alerting

WebSocket Scaling: Sticky Sessions, Redis Pub/Sub Fan-Out, and Horizontal Scaling