Why Synchronous Flows Failed
Synchronous API flows assumed ideal conditions. In production, network delays, retries, and partial failures made this assumption unsafe.
Requests would time out after the gateway had already processed them, leaving transactions in an unknown state. Retries compounded the problem, creating duplicate operations and inconsistent data.
Event-Driven Design
All critical operations were converted into messages. Each step in the process became an independent unit of work handled asynchronously by workers.
This decoupled request handling from execution and allowed the system to absorb traffic spikes without cascading failures.
Idempotency as a First-Class Constraint
Every message carried a unique idempotency key. Workers always checked whether an operation had already been processed before executing it.
This ensured retries were safe and prevented duplicate side effects even during crashes or redeployments.
async function processMessage(message: PaymentMessage) {
const { idempotencyKey, payload } = message;
// Idempotency check
const existing = await store.get(idempotencyKey);
if (existing) {
return existing;
}
try {
const result = await handlePayment(payload);
await store.set(idempotencyKey, result, {
ttl: 7 * 24 * 60 * 60,
});
return result;
} catch (error) {
if (isRetryable(error)) {
throw error; // re-queue with backoff
}
await markAsFailed(payload, error);
}
}Failure Handling
Retries used exponential backoff to avoid overwhelming external services. Messages that exceeded retry limits were moved to a dead-letter queue for manual inspection.
This ensured failures were visible and actionable instead of silently lost.
Production Reality
Retries without idempotency will eventually duplicate data or lose money.
Observability
Queue depth, retry counts, processing latency, and failure reasons were tracked as first-class metrics.
This made system behavior transparent and allowed issues to be detected before users were affected.
Outcome
The system became resilient to spikes, retries, and partial outages. Failures were isolated, recoverable, and observable instead of catastrophic.