Scatter-gather
How scatter-gather fans out a request to multiple services in parallel, collects results, and merges them. Covers partial response handling, timeout strategies, and when to gate on all vs. some responses.
TL;DR -- Scatter-gather fans one request to N workers in parallel, collects their responses, and merges them into a single result. Total latency equals the slowest shard, not the sum of all shards. The hard part is handling stragglers, partial failures, and merging heterogeneous results under a deadline.
The Problem
You have data spread across multiple shards, services, or nodes. A single request needs information from all of them. The naive approach is to query each one sequentially.
Sequential fan-out is a latency disaster. If each of your 10 shards takes 50ms on average, sequential queries cost 500ms total. Even worse, your p99 latency becomes the sum of individual p99 latencies, which compounds rapidly.
The problem gets worse as you add shards. Every new shard adds its full latency to the total. You need a way to query all shards simultaneously and pay only the cost of the slowest one.
One-Line Definition
Scatter-gather fans one request to N downstream services in parallel and merges their responses into a single result, paying the latency cost of only the slowest responder.
Analogy
Think of a teacher collecting homework. The slow approach: walk to each student's desk one by one, wait for them to hand it over, then move to the next desk. The scatter-gather approach: announce "everyone put your homework on my desk." You wait only for the slowest student, not the sum of all students.
Now imagine one student is absent. Do you wait indefinitely, or do you grade what you have and mark one submission as missing? That is the timeout decision every scatter-gather system faces.
Solution Walkthrough
The Two Phases
Scatter-gather has exactly two phases.
Scatter phase: The coordinator receives a request and immediately dispatches sub-requests to all N downstream services. These calls happen concurrently. The coordinator does not wait for any response before sending the next request.
Gather phase: The coordinator collects responses as they arrive. Once it has enough responses (all N, or a quorum, or a timeout fires), it merges the results and returns a single response to the caller.
Timeout Strategies
The most critical design decision is what happens when a shard is slow or unresponsive. You have three options.
Wait-for-all gates on every shard completing. This guarantees complete results but your latency matches the slowest shard. Your p99 latency becomes the p99 of the worst-performing shard, which means one slow shard tanks your entire query.
Timeout with partial results sets a deadline. When the deadline expires, the coordinator returns whatever it has collected so far and marks the response as partial. The caller must handle incomplete data gracefully.
Hedged requests are the most sophisticated approach. After a threshold (for example, the p95 latency of a typical shard), the coordinator sends a duplicate request to an additional replica. Whichever copy responds first wins. This trades a small amount of extra load for significant tail-latency reduction.
Google's "The Tail at Scale" paper showed that hedged requests can reduce p99 latency by 40-50% while adding only ~5% extra load. The key insight: most stragglers are caused by transient issues (GC pauses, disk seeks), so a retry to a different replica almost always returns faster.
Straggler Mitigation
Stragglers are the enemy of scatter-gather. With N shards, the probability that at least one is slow grows quickly. If each shard has a 1% chance of being a straggler, then with 100 shards you are almost guaranteed at least one straggler on every request.
Straggler causes include garbage collection pauses, disk I/O contention, network congestion, hot partitions with uneven data distribution, and noisy neighbors in shared infrastructure. Hedged requests handle transient stragglers well but do not fix structural imbalances like hot shards.
Result Merging Strategies
The merge step depends entirely on what you are aggregating.
Top-K merge (search): Each shard returns its local top-K results sorted by score. The coordinator performs a K-way merge using a min-heap of size K. Time complexity is O(N * K * log K) where N is the shard count. This is how Elasticsearch and Solr merge distributed search results.
Aggregation merge (analytics): Each shard computes partial aggregates (sum, count, min, max, distinct count). The coordinator combines them. Sums and counts combine trivially. Distinct counts require probabilistic structures like HyperLogLog to merge without overcounting.
Set union (collect all): Gather all items from all shards, deduplicate by ID. Used when you need the complete dataset, not a ranked subset.
Intersection (all must agree): Only return items that appear in every shard's response. Used in multi-index search where results must match all criteria.
Top-K merge has a subtle correctness issue. If you ask each shard for top-3 and merge, you might miss a globally-top-3 document that ranked 4th on its local shard. The safe approach: ask each shard for top-K where K is the final result size, then re-rank globally. For deep pagination (page 100 of results), each shard must return K*page items, which gets expensive.
Implementation Sketch
A scatter-gather coordinator in TypeScript with timeout and hedged requests:
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.