📐HowToHLD
Vote for New Content
Vote for New Content
Home/High Level Design/Trade-offs

Sync vs async

Learn when synchronous request-response is the right tool and when async queues unlock scale that blocking calls can never achieve, with the math to defend each choice.

31 min read2026-03-27mediumsyncasyncmessage-queuesdistributed-systemshld

TL;DR

  • Use synchronous when the caller needs the result to continue: user-facing reads, login, checkout, any interaction where the HTTP response carries actionable data.
  • Use asynchronous when the work can happen after the HTTP response: sending emails, running analytics, processing images, updating search indexes, propagating data to downstream systems.
  • The availability trap: a sync chain of three 99.9% services delivers 99.7% availability to the caller. Add five more hops and you are at 99.2%. Every sync dependency caps your SLA.
  • The latency trap: sync chains compound tail latency. A user request making four sequential sync calls inherits the P99 of each. P99 total is roughly P99_A + P99_B + P99_C + P99_D, not the average.
  • Asynchronous delivery guarantees vary by broker: Kafka preserves per-partition order and enables replay; SQS does not guarantee order. Choose based on whether ordering matters to your consumers.

The Framing

In 2019, an engineering team migrating to microservices kept every service call synchronous, because that was the pattern they knew from their monolith. Within three months they had eleven services and a forty-minute checkout outage. Service A was waiting on Service B which was waiting on Service C, and a database timeout in Service F cascaded backwards through the whole chain.

Synchronous calls look safe. You call a service, you get a response, you continue. It mirrors function calls, which are the mental model every engineer starts with. The problem is that a function call is in-process, infallible, and instant. A network call is none of those things.

The question is not "sync or async?" The question is: can my caller tolerate being blocked until this operation completes?

Decision tree: three questions about whether the caller is waiting, whether downstream failure must cascade, and whether the operation is idempotent lead to SYNC, ASYNC, or HYBRID outcomes.
Three questions to choose between sync, async, or hybrid. Most decisions resolve at the first branch.

How Each Works

Synchronous (Request-Response)

The caller sends a request and blocks. It does nothing else until the response arrives. The server processes the request, produces a result, and sends it back in the same TCP connection.

The important point: the caller's thread is occupied for the full duration. In a Node.js event loop this is blocking the I/O loop; in a thread-per-request server it consumes one thread from the pool. Under load, thread pools and connection pools are the actual scaling ceiling for synchronous systems.

Sync is simple, debuggable, and correct for operations where the caller needs a confirmed result before proceeding.

A client box on the left sends an HTTP request to a server on the right. The client is marked BLOCKED during processing. The server shows a processing box taking 50ms. The response arrives after the full processing time.
The caller blocks for the full server processing time. Fast operations under 100ms are fine. Deep sync chains multiply this at every hop.

Asynchronous (Event-Driven / Queue-Based)

The caller publishes a message to a broker, gets an acknowledgement, and moves on. A separate consumer processes the message later. The caller receives no immediate result unless it implements polling or a callback mechanism.

This decoupling is the entire point: producers and consumers are no longer availability-coupled. The queue buffers demand spikes, enables retries, and gives producers a clean failure mode (if consumers are slow, the queue depth grows; producers do not stall).

Async is harder to build correctly, harder to debug without proper tooling, and the right choice whenever the work does not need to complete before the caller's response.

A producer sends a publish arrow to a message queue in the center. The queue sends an immediate ACK back. A consumer on the right receives a deliver arrow from the queue and processes later. The producer shows 202 Accepted immediately.
The producer returns immediately after the queue ACKs. Consumer processes in its own time. The queue is the decoupling point: producers and consumers are completely independent.

Head-to-Head Comparison

DimensionSynchronousAsynchronous
Caller waits?Yes, thread occupied until doneNo, caller acked immediately
LatencyEqual to server processing timeNear-zero for the caller (queue ACK only)
ThroughputBounded by thread pool and connection poolBounded by queue throughput and consumer count
Failure propagationDownstream failure propagates to callerQueue buffers failure, consumer retries independently
ConsistencyStrong: response confirms completionEventual: consumer processes at some later time
ObservabilityTrivial: one trace covers one requestHard: trace context must be propagated through message headers
RecoveryCaller retries the whole requestConsumer retries its own message; DLQ for unrecoverable failures
Ordering guaranteesImplicit (single request, single response)Queue-dependent: Kafka preserves per-partition order, SQS does not
BackpressureNone by default, clients pile on until threads exhaustBuilt-in: producers blocked when queue fills (Kafka) or monitored via queue depth
Developer modelFamiliar, reads like function callsExplicit state machines and idempotency requirements
DebuggingStraightforward, trace covers one requestHarder, correlated by trace_id across services and time windows

The fundamental tension here is developer simplicity vs. operational resilience. Synchronous calls are easier to reason about and debug. Asynchronous calls are harder to build correctly (and substantially harder to debug without tracing infrastructure), but they decouple failure domains in ways that synchronous calls simply cannot.


When Sync Wins

If the user is waiting for the response, sync is almost always correct. More often than most "async-first" articles admit, the default answer is sync.

Use synchronous when:

  • The API response carries data the caller needs to continue (e.g., a user submitting a payment needs to know if the charge succeeded before seeing the confirmation page).
  • The operation is fast and bounded (P99 under 200ms). Adding a queue for a 5ms database read adds latency and complexity with zero benefit.
  • You need strong consistency: if two services must agree on a result before proceeding (checking and reserving inventory for a purchase), sync within a transaction boundary is the honest answer.
  • You are calling within a bounded context. Inter-process calls within the same domain can often stay synchronous because failure modes are simpler and latency budgets are tight.
  • The downstream is internal and reliable (99.9th percentile or better). Async is overkill when the thing you are calling rarely fails and always responds fast.

Sync for user-facing reads and writes, sync for internal reliable calls, sync for fast operations that need a confirmed result. Those three cover the large majority of API calls in most systems.

Recall the availability trap: every sync dependency multiplies your failure rate. Here is where async breaks that cycle.


When Async Wins

Use asynchronous when:

  • The work can happen after the HTTP response (email confirmations, push notifications, analytics events, cache invalidation propagation, downstream data sync).
  • Downstream processing is slow or unpredictable: generating a PDF, transcoding a video, running ML inference. These take seconds or minutes. Sync calls to them are untenable.
  • You must absorb traffic spikes. A queue buffers demand so consumers process at a sustainable rate; without it, spikes hit the database or downstream service directly.
  • Consumers may be temporarily unavailable. A queue persists messages while consumers are restarted or redeployed; a sync call to an unavailable service fails immediately at the caller.
  • Fan-out is required: one event triggering ten downstream consumers is clean with a queue and brittle with ten sequential or parallel sync calls.
  • You need at-least-once delivery with retries. Consumer-side retry logic is clean; caller-side retry of a partially-completed sync call is dangerous.

When you reach for async, you must also implement a dead-letter queue (DLQ). Any async system without a DLQ is incomplete. When a consumer fails after N retries, the message must go somewhere. A DLQ catches unprocessable messages, enables manual replay or investigation, and prevents a single bad message from blocking the rest of the partition.


The Nuance

Here is the honest answer on the "sync vs async" framing: it is a false binary in production systems.

Every high-scale system uses sync for user-facing paths (users are waiting) and async for background propagation. The user clicking "Buy" gets a synchronous payment confirmation. But the events triggered by that purchase (warehouse pick list, email receipt, loyalty points, analytics, recommendation model update) all go asynchronous.

The engineering challenge is not choosing one. It is knowing exactly which operations belong to which category, and what happens at the boundary.

The Hybrid Pattern: 202 Accepted with Polling

When an operation is user-initiated but genuinely long-running (PDF generation, bulk data export, video processing), neither pure sync nor fire-and-forget is right. Sync would timeout or hang the user's request; async would leave no way to know when the job completes. Return 202 Accepted with a job ID, process asynchronously, and let the client poll or receive a webhook. This unblocks the user immediately while preserving the contract that they will eventually get a result.

sequenceDiagram
    participant C as 👤 Client
    participant API as ⚙️ API Service
    participant Q as 📨 Job Queue
    participant W as 👷 Worker
    participant DB as 🗄️ Database

    Note over C,DB: Long-running async job (report generation)
    C->>API: POST /reports/generate
    API->>Q: enqueue(job_id=abc123)
    Q-->>API: ACK
    API-->>C: 202 Accepted { job_id: "abc123" }

    Note over C,DB: Client polls at reasonable interval

    C->>API: GET /jobs/abc123
    API-->>C: 200 OK { status: "running", progress: 42% }

    activate W
    W->>Q: consume(job_id=abc123)
    W->>DB: heavy computation
    DB-->>W: results
    W->>DB: UPDATE jobs SET status = "complete"
    deactivate W

    C->>API: GET /jobs/abc123
    API-->>C: 200 OK { status: "done", result_url: "/results/abc123" }

This handles the case cleanly: the user gets immediate feedback, the server is not holding an open connection, the work completes in its own time, and the client knows when it is done.


Real-World Examples

Uber's dispatch system: sync for confirmation, async for everything else

The moment a driver accepts a ride, Uber needs to confirm it to the rider in under 500ms. That core confirmation is synchronous: the rider's app is waiting and the UX is catastrophically bad if there is a 10-second delay. But everything triggered by that acceptance (driver rating updates, route calculation, ETA propagation, analytics, loyalty point accrual) happens asynchronously through Kafka topics. Uber processes billions of Kafka messages per day. The synchronous surface area is ruthlessly small by design, scoped only to operations where a human is watching the screen wait.

Stripe: synchronous charge, asynchronous webhooks

When you call POST /charges, Stripe processes the charge synchronously (because you need to know if it succeeded before continuing to show a confirmation page to your user). Stripe returns a result within seconds. But every downstream event from that charge (revenue recognition, fraud model training, dispute monitoring, statement generation) is delivered asynchronously via webhook. Stripe keeps payment processing synchronous because the caller needs a confirmed result. Their downstream is all async for exactly the reasons above: resilience, fan-out, and decoupled failure domains.

Amazon's order pipeline

When you click "Place Order" on Amazon, the confirmation you see is synchronous: Amazon's servers have reserved your inventory and charged your payment card before responding (because you need to know if it succeeded before continuing). But the order then enters an event-driven pipeline: warehouse routing, seller notification, fulfillment, shipment tracking, and metrics all happen as async message chains. The rule Amazon discovered: sync until the user needs an answer, then async for everything that can happen after. Synchronous checkout confirmation takes 3-5 seconds. Synchronous propagation to all fourteen downstream services would make checkout take thirty seconds.


How This Shows Up in Interviews

My recommendation: when an interviewer asks you to design a system, explicitly separate sync paths from async paths within the first ten minutes. Draw the boundary. Show what goes into a queue. Show what gets a direct HTTP response. Narrating this distinction is one of the clearest signals that you understand distributed systems at a non-trivial level.

The phrase that earns senior-level credit

When you add async processing in a design, say: "These operations (email, analytics, and search indexing) don't block the user response. I'll publish an event to a Kafka topic here and fan out to three consumers. The caller gets 200 OK immediately; these all process within a few seconds." That one sentence shows you understand decoupling, fan-out, and latency implications simultaneously.

The poison pill you did not anticipate

A message that never succeeds on retry (malformed payload, missing foreign key, illegal state transition) is a poison pill. After N retries, it blocks the entire partition. It must route to a dead-letter queue or topic. If you configure Kafka consumers without addressing this, expect this question in interviews.

Depth expected at senior/staff level:

  • Explain the availability math: sync chains degrade as 0.999^N. Name the number for whatever chain you are designing.
  • Know the at-least-once delivery contract: "Kafka guarantees at-least-once delivery, so every consumer must be idempotent. I will use the message's event_id as an idempotency key."
  • Address ordering guarantees: SQS does not guarantee order. Kafka guarantees per-partition order. If you need globally ordered processing, name the constraint and the fix.
  • Know the difference between choreography (services react to events) and orchestration (a central saga drives the workflow), and the debugging trade-off of each.
  • Address the observability gap: trace context must be propagated through message headers or your async operations are invisible in distributed tracing.

Common follow-up questions:

Interviewer asksStrong answer
"How does your service handle a downstream outage?""Sync calls get a circuit breaker. After N failures it opens and fails fast with a cached or default response. Async consumers retry independently with exponential backoff; the queue accumulates without producer impact."
"What if your consumer falls behind and the queue fills up?""Backpressure: Kafka producers block when the queue exceeds configured limits. SQS queues grow unboundedly by default, so I would alarm on queue depth and trigger consumer auto-scaling before the backlog causes time-sensitive message staleness."
"How do you prevent duplicate processing in your async pipeline?""Consumers must be idempotent. I use the event's unique ID as an idempotency key stored in a separate table. On message receipt, check the key before processing; if already processed, ACK and discard."
"How would you add observability to your async pipeline?""Propagate trace context (trace_id, span_id) as headers in every message. Consumers read these headers and continue the trace as a child span. Without this, async operations are invisible in Jaeger or Zipkin."
"What's the difference between a queue and an event log?""A queue is for task distribution: messages are consumed once and removed (SQS, RabbitMQ). An event log is append-only: consumers read independently and can replay from any offset (Kafka, Kinesis). Use queues for work distribution, logs for event sourcing and replay."

Test Your Understanding


Quick Recap

  1. Synchronous holds the caller's thread until the server responds; asynchronous lets the caller return immediately and processes work later via a queue or event stream.
  2. Sync chains compound failure: three 99.9% services in sequence deliver 99.7% availability to the caller, not 99.9%. Eight services deliver 99.2%.
  3. Sync chains compound tail latency: the P99 of a four-hop chain is approximately the sum of each hop's P99, not the average.
  4. Async requires at-least-once delivery handling: every consumer must be idempotent, with an idempotency key checked before processing to prevent duplicate side effects.
  5. Dead-letter queues are non-negotiable: without one, a single malformed message blocks all subsequent messages in the same Kafka partition or SQS queue.
  6. Most production systems are hybrid: sync for user-facing paths with a confirmed result, async for background propagation and fan-out.
  7. Architectural complexity is the primary cost of async systems (idempotency, DLQs, trace propagation), but the payoff is decoupling failure domains, a tradeoff worth accepting for inter-service communication.

Related Trade-offs

  • Message queues: Deep dive into Kafka, SQS, and RabbitMQ internals: delivery guarantees, ordering, partitioning, and consumer group scaling are the implementation layer behind every async design.
  • Microservices: The architectural context where sync vs. async matters most. Every inter-service call is either a sync dependency or an async decoupling point, and the choice between them determines your cascading failure surface.
  • Caching: The fastest way to reduce sync latency on read-heavy paths, often making async unnecessary for hot database reads by eliminating the database round-trip entirely.
  • Rate limiting: How to apply backpressure at the front door when async queues are filling faster than consumers can drain them, preventing queue depth from growing unboundedly.

Previous

Push vs pull

Next

SQL vs NoSQL

Comments

On This Page

TL;DRThe FramingHow Each WorksSynchronous (Request-Response)Asynchronous (Event-Driven / Queue-Based)Head-to-Head ComparisonWhen Sync WinsWhen Async WinsThe NuanceThe Hybrid Pattern: 202 Accepted with PollingReal-World ExamplesHow This Shows Up in InterviewsTest Your UnderstandingQuick RecapRelated Trade-offs