Back to Work

Event-Driven Processing with Queues & Workers

Introduced asynchronous, event-driven processing to eliminate cascading failures and improve system resilience under load.

Role: Senior Backend Engineer | 0 views

The Challenge

Synchronous workflows caused cascading failures during traffic spikes and third-party outages. A single slow or failing dependency could block user-facing requests and leave transactions in inconsistent states.

The Solution

I redesigned critical workflows around an event-driven model using queues and background workers. Operations were broken into idempotent steps, retries were handled automatically, and failures were isolated without affecting user-facing flows.

Deep Dive

Why Synchronous Flows Failed

Synchronous API flows assumed ideal conditions. In production, network delays, retries, and partial failures made this assumption unsafe.

Requests would time out after the gateway had already processed them, leaving transactions in an unknown state. Retries compounded the problem, creating duplicate operations and inconsistent data.

Event-Driven Design

All critical operations were converted into messages. Each step in the process became an independent unit of work handled asynchronously by workers.

This decoupled request handling from execution and allowed the system to absorb traffic spikes without cascading failures.

Idempotency as a First-Class Constraint

Every message carried a unique idempotency key. Workers always checked whether an operation had already been processed before executing it.

This ensured retries were safe and prevented duplicate side effects even during crashes or redeployments.

workers/deposit.consumer.js Text
async function processMessage(message: PaymentMessage) {
  const { idempotencyKey, payload } = message;

  // Idempotency check
  const existing = await store.get(idempotencyKey);
  if (existing) {
    return existing;
  }

  try {
    const result = await handlePayment(payload);

    await store.set(idempotencyKey, result, {
      ttl: 7 * 24 * 60 * 60,
    });

    return result;
  } catch (error) {
    if (isRetryable(error)) {
      throw error; // re-queue with backoff
    }

    await markAsFailed(payload, error);
  }
}

Failure Handling

Retries used exponential backoff to avoid overwhelming external services. Messages that exceeded retry limits were moved to a dead-letter queue for manual inspection.

This ensured failures were visible and actionable instead of silently lost.

Production Reality

Retries without idempotency will eventually duplicate data or lose money.

Observability

Queue depth, retry counts, processing latency, and failure reasons were tracked as first-class metrics.

This made system behavior transparent and allowed issues to be detected before users were affected.

Outcome

The system became resilient to spikes, retries, and partial outages. Failures were isolated, recoverable, and observable instead of catastrophic.

Interested in similar results?

Let's discuss how I can help with your project.

Get in Touch