Back to Blog

Background Jobs Architecture: BullMQ, Retries, Dead Letter Queues, and Concurrency

Build a production background job system with BullMQ: configure job queues, implement exponential backoff retries, route failed jobs to dead letter queues, tune worker concurrency, and monitor queue health with Prometheus.

Viprasol Tech Team
October 4, 2026
13 min read

Background jobs are where most production reliability problems hide. The HTTP request returns 200, the job is queued, and somewhere in the background it silently fails โ€” no retry, no alerting, no dead letter queue. The user never finds out their export wasn't generated or their email wasn't sent.

BullMQ on Redis solves this with persistent queues, configurable retries, and first-class dead letter queue (DLQ) support. The architecture question isn't whether to use a job queue โ€” it's how to configure it correctly for your failure modes.


Queue Architecture Overview

Producer (API server)
  โ”‚
  โ”œโ”€โ”€ emailQueue        โ†’ email-worker (concurrency: 10)
  โ”œโ”€โ”€ exportQueue       โ†’ export-worker (concurrency: 3, heavy CPU)
  โ”œโ”€โ”€ notificationQueue โ†’ notification-worker (concurrency: 20)
  โ””โ”€โ”€ webhookQueue      โ†’ webhook-worker (concurrency: 5)

Failed jobs (after maxAttempts) โ†’  DLQ (dead letter queue)
  โ””โ”€โ”€ Manual review / reprocessing

Queue and Worker Setup

// src/queues/index.ts
import { Queue, QueueEvents } from "bullmq";
import { Redis } from "ioredis";

// Separate Redis connection for each queue component
// BullMQ requires dedicated connections (no multiplexing)
const connection = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null, // Required by BullMQ
  enableReadyCheck: false,
});

// Shared queue options
const defaultQueueOptions = {
  connection,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: "exponential" as const,
      delay: 1000, // Start at 1s, then 2s, 4s
    },
    removeOnComplete: {
      age: 24 * 3600,   // Keep completed jobs for 24 hours
      count: 1000,       // Keep last 1000 completed jobs
    },
    removeOnFail: false, // Keep failed jobs for DLQ processing
  },
};

// Define queue payloads as discriminated union for type safety
export interface EmailJobData {
  type: "welcome" | "password-reset" | "invoice" | "notification";
  to: string;
  userId: string;
  templateData: Record<string, unknown>;
}

export interface ExportJobData {
  type: "csv" | "pdf" | "xlsx";
  userId: string;
  tenantId: string;
  filters: Record<string, unknown>;
  notifyEmail: string;
}

export interface WebhookJobData {
  url: string;
  payload: unknown;
  secret: string;
  tenantId: string;
  eventType: string;
  attemptNumber?: number;
}

// Create queues
export const emailQueue = new Queue<EmailJobData>("email", defaultQueueOptions);
export const exportQueue = new Queue<ExportJobData>("export", {
  ...defaultQueueOptions,
  defaultJobOptions: {
    ...defaultQueueOptions.defaultJobOptions,
    attempts: 2,          // Exports are expensive โ€” limit retries
    timeout: 5 * 60_000,  // 5-minute timeout
  },
});
export const webhookQueue = new Queue<WebhookJobData>("webhook", {
  ...defaultQueueOptions,
  defaultJobOptions: {
    ...defaultQueueOptions.defaultJobOptions,
    attempts: 5,          // Webhooks: retry more aggressively
    backoff: {
      type: "exponential",
      delay: 5000,        // Start at 5s for webhooks
    },
  },
});

// Dead letter queue โ€” receives jobs that exhausted all retries
export const dlq = new Queue("dead-letter", { connection });

๐ŸŒ Looking for a Dev Team That Actually Delivers?

Most agencies sell you a project manager and assign juniors. Viprasol is different โ€” senior engineers only, direct Slack access, and a 5.0โ˜… Upwork record across 100+ projects.

  • React, Next.js, Node.js, TypeScript โ€” production-grade stack
  • Fixed-price contracts โ€” no surprise invoices
  • Full source code ownership from day one
  • 90-day post-launch support included

Worker Implementation

// src/workers/email.worker.ts
import { Worker, Job, UnrecoverableError } from "bullmq";
import { Redis } from "ioredis";
import { EmailJobData, dlq } from "../queues";

const connection = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null,
});

export const emailWorker = new Worker<EmailJobData>(
  "email",
  async (job: Job<EmailJobData>) => {
    const { type, to, userId, templateData } = job.data;

    // Update progress for long-running jobs
    await job.updateProgress(10);

    try {
      // Route to appropriate email handler
      switch (type) {
        case "welcome":
          await sendWelcomeEmail(to, templateData);
          break;
        case "invoice":
          await sendInvoiceEmail(to, templateData);
          break;
        case "password-reset":
          await sendPasswordResetEmail(to, templateData);
          break;
        default:
          throw new UnrecoverableError(`Unknown email type: ${type}`);
          // UnrecoverableError: BullMQ will NOT retry โ€” goes straight to failed
      }

      await job.updateProgress(100);

      // Return value is stored with completed job
      return { sentAt: new Date().toISOString(), provider: "resend" };
    } catch (error) {
      // Classify errors: recoverable vs unrecoverable
      if (error instanceof EmailAddressInvalidError) {
        // Don't retry invalid addresses โ€” burn through retries
        throw new UnrecoverableError(
          `Invalid email address: ${to}. ${error.message}`
        );
      }

      // All other errors: let BullMQ retry according to backoff config
      throw error;
    }
  },
  {
    connection,
    concurrency: 10, // Process 10 emails simultaneously per worker process

    // Rate limiting: max 100 emails per 10 seconds (Resend free tier)
    limiter: {
      max: 100,
      duration: 10_000,
    },
  }
);

// Handle job lifecycle events
emailWorker.on("completed", (job, result) => {
  console.log(`[email] Job ${job.id} completed:`, result);
});

emailWorker.on("failed", async (job, error) => {
  if (!job) return;

  console.error(`[email] Job ${job.id} failed (attempt ${job.attemptsMade}):`, error.message);

  // Move to DLQ after all retries exhausted
  if (job.attemptsMade >= (job.opts.attempts ?? 3)) {
    await dlq.add(
      "failed-email",
      {
        originalQueue: "email",
        jobId: job.id,
        jobData: job.data,
        error: error.message,
        failedAt: new Date().toISOString(),
        attemptsMade: job.attemptsMade,
      },
      { removeOnComplete: false }
    );
  }
});

emailWorker.on("error", (error) => {
  console.error("[email] Worker error:", error);
});

Dead Letter Queue Processing

// src/workers/dlq.worker.ts
// Process the DLQ: alert, log to database, enable manual retry

import { Worker, Job } from "bullmq";

interface DLQJobData {
  originalQueue: string;
  jobId: string | undefined;
  jobData: unknown;
  error: string;
  failedAt: string;
  attemptsMade: number;
}

const dlqWorker = new Worker<DLQJobData>(
  "dead-letter",
  async (job: Job<DLQJobData>) => {
    const { originalQueue, jobId, jobData, error, failedAt, attemptsMade } =
      job.data;

    // 1. Persist to database for audit trail and manual review
    await db.query(
      `INSERT INTO failed_jobs
       (original_queue, original_job_id, job_data, error_message, failed_at, attempts_made)
       VALUES ($1, $2, $3, $4, $5, $6)`,
      [
        originalQueue,
        jobId,
        JSON.stringify(jobData),
        error,
        failedAt,
        attemptsMade,
      ]
    );

    // 2. Alert on-call if the failure rate is high
    const recentFailures = await db.query<{ count: string }>(
      `SELECT COUNT(*)::text FROM failed_jobs
       WHERE original_queue = $1
         AND failed_at > NOW() - INTERVAL '1 hour'`,
      [originalQueue]
    );

    const failureCount = parseInt(recentFailures.rows[0].count);
    if (failureCount >= 10) {
      await alertOncall({
        severity: "warning",
        message: `${failureCount} jobs failed in ${originalQueue} queue in the last hour`,
        queue: originalQueue,
        runbook: "https://runbooks.internal/job-queue-failures",
      });
    }

    // 3. Notify affected user (where applicable)
    if (originalQueue === "export") {
      const exportData = jobData as { notifyEmail: string; type: string };
      await sendSystemEmail(exportData.notifyEmail, "export-failed", {
        exportType: exportData.type,
        supportLink: "https://viprasol.com/contact",
      });
    }
  },
  { connection, concurrency: 1 } // Process DLQ items one at a time
);

๐Ÿš€ Senior Engineers. No Junior Handoffs. Ever.

You get the senior developer, not a project manager who relays your requirements to someone you never meet. Every Viprasol project has a senior lead from kickoff to launch.

  • MVPs in 4โ€“8 weeks, full platforms in 3โ€“5 months
  • Lighthouse 90+ performance scores standard
  • Works across US, UK, AU timezones
  • Free 30-min architecture review, no commitment

Job Producers: Enqueueing Jobs

// src/services/email.service.ts
import { emailQueue } from "../queues";

export class EmailService {
  // Simple enqueue
  async sendWelcomeEmail(userId: string, email: string): Promise<void> {
    await emailQueue.add(
      "welcome",
      {
        type: "welcome",
        to: email,
        userId,
        templateData: { signupDate: new Date().toISOString() },
      },
      {
        // Job-level overrides
        jobId: `welcome-${userId}`, // Idempotent: won't add duplicate
        delay: 5 * 60_000,          // Send 5 minutes after signup
      }
    );
  }

  // Bulk enqueue with rate limiting
  async sendBulkNotifications(
    userIds: string[],
    message: string
  ): Promise<void> {
    const jobs = userIds.map((userId, index) => ({
      name: "notification",
      data: {
        type: "notification" as const,
        to: `user-${userId}@example.com`,
        userId,
        templateData: { message },
      },
      opts: {
        delay: index * 100, // Stagger by 100ms to avoid thundering herd
        priority: 10,       // Lower priority than transactional emails
      },
    }));

    await emailQueue.addBulk(jobs);
  }

  // Scheduled/recurring jobs
  async scheduleMonthlyReport(tenantId: string): Promise<void> {
    await emailQueue.add(
      "monthly-report",
      {
        type: "notification",
        to: "admin@tenant.com",
        userId: "system",
        templateData: { tenantId, reportMonth: new Date().toISOString() },
      },
      {
        repeat: {
          pattern: "0 8 1 * *", // 8am on the 1st of every month (cron)
          tz: "America/New_York",
        },
        jobId: `monthly-report-${tenantId}`, // Stable ID for the repeated job
      }
    );
  }
}

Queue Monitoring

// src/monitoring/queue-metrics.ts
import { QueueEvents, Queue } from "bullmq";
import { Gauge, Counter, register } from "prom-client";

const queueDepth = new Gauge({
  name: "bullmq_queue_depth",
  help: "Number of jobs waiting in queue",
  labelNames: ["queue", "status"],
  registers: [register],
});

const jobsProcessed = new Counter({
  name: "bullmq_jobs_processed_total",
  help: "Total jobs processed",
  labelNames: ["queue", "outcome"],
  registers: [register],
});

export async function collectQueueMetrics(queues: Queue[]) {
  for (const queue of queues) {
    const counts = await queue.getJobCounts(
      "waiting",
      "active",
      "delayed",
      "failed",
      "completed"
    );

    queueDepth.set({ queue: queue.name, status: "waiting" }, counts.waiting);
    queueDepth.set({ queue: queue.name, status: "active" }, counts.active);
    queueDepth.set({ queue: queue.name, status: "delayed" }, counts.delayed);
    queueDepth.set({ queue: queue.name, status: "failed" }, counts.failed);

    // Alert if DLQ is growing
    if (queue.name === "dead-letter" && counts.waiting > 50) {
      console.error(`DLQ depth: ${counts.waiting} โ€” investigate failed jobs`);
    }
  }
}

// BullMQ Board โ€” web UI for queue inspection (development + staging)
// npm install @bull-board/fastify @bull-board/api
import { createBullBoard } from "@bull-board/api";
import { BullMQAdapter } from "@bull-board/api/bullMQAdapter";
import { FastifyAdapter } from "@bull-board/fastify";

export function setupBullBoard(
  app: FastifyInstance,
  queues: Queue[]
): void {
  const serverAdapter = new FastifyAdapter();
  serverAdapter.setBasePath("/admin/queues");

  createBullBoard({
    queues: queues.map((q) => new BullMQAdapter(q)),
    serverAdapter,
  });

  app.register(serverAdapter.registerPlugin(), {
    prefix: "/admin/queues",
    basePath: "/admin/queues",
  });
}

Concurrency Tuning Guide

Worker TypeConcurrencyReasoning
Email sending10โ€“20I/O-bound, limited by provider rate limits
PDF/report generation2โ€“4CPU-bound, memory-intensive
Webhook delivery5โ€“10I/O-bound, external latency
Database-heavy jobs3โ€“5Limited by connection pool
External API calls10โ€“20I/O-bound, watch API rate limits
Image processing2โ€“3CPU + memory intensive

Rule: concurrency ร— workers ร— avg_job_duration_ms / 1000 = sustained throughput in jobs/sec


See Also


Working With Viprasol

Background job reliability is non-negotiable for SaaS products where users expect async operations (exports, report generation, email sequences) to complete correctly. We design job queue architectures with proper retry policies, dead letter queues, concurrency limits, and monitoring so failures are caught and resolved before users notice.

Backend engineering โ†’ | Talk to our engineers โ†’

Share this article:

About the Author

V

Viprasol Tech Team

Custom Software Development Specialists

The Viprasol Tech team specialises in algorithmic trading software, AI agent systems, and SaaS development. With 100+ projects delivered across MT4/MT5 EAs, fintech platforms, and production AI systems, the team brings deep technical experience to every engagement. Based in India, serving clients globally.

MT4/MT5 EA DevelopmentAI Agent SystemsSaaS DevelopmentAlgorithmic Trading

Need a Modern Web Application?

From landing pages to complex SaaS platforms โ€” we build it all with Next.js and React.

Free consultation โ€ข No commitment โ€ข Response within 24 hours

Viprasol ยท Web Development

Need a custom web application built?

We build React and Next.js web applications with Lighthouse โ‰ฅ90 scores, mobile-first design, and full source code ownership. Senior engineers only โ€” from architecture through deployment.