No backpressure anti-pattern
Learn why unbounded queues mask producer-consumer rate mismatches until they cause OOM crashes, and how backpressure, load shedding, and bounded queues prevent it.
TL;DR
- Backpressure is the mechanism by which a slow consumer signals to a fast producer to slow down. Without it, producers keep producing and queues grow unbounded.
- An unbounded queue is a delayed crash: memory fills up gradually, then the process OOMs at the worst possible time (peak traffic, critical path).
- The three responses to backpressure: slow down (rate-limit the producer), shed load (drop or reject excess work), scale up (add consumers). Each has a different cost and latency implication.
- The most dangerous anti-pattern is silently swallowing backpressure: a queue that "just grows" looks healthy until it doesn't.
The Problem
It's 2:47 a.m. on Black Friday. Your notification service receives events from Kafka and sends emails. During normal operation, events arrive at 500/second and you process 500/second. Perfect balance. Your monitoring dashboards are green. Queue depth hovers near zero.
Then traffic spikes. Events arrive at 5,000/second. Your email provider rate-limits you to 600/second. Your in-memory processing queue starts growing: 4,400 events per second being added, 600 per second being removed.
After 10 minutes: 4400 × 600 = 2,640,000 pending events. Your notification service's heap is at 8GB. It OOMs and crashes. On restart, it picks up where Kafka left off, still at 5,000 events/second. It crashes again in 10 minutes. Your on-call engineer wakes up to a PagerDuty storm: the service has restarted 6 times in the last hour.
The queue was the problem. It had no maximum size, so it silently absorbed every message the producer sent. No alert fired, no error logged, no metric crossed a threshold. The system looked healthy right up until the moment it died.
Meanwhile, customers received zero emails for a 10-minute window, then received 4.4 million emails in a flood once you scaled the consumer. Users got 50 "your order was placed" emails. Support tickets spiked. Twitter threads appeared.
The root problem was no backpressure: the consumer had no mechanism to signal "slow down" to the producer, and the queue had no bound. The queue was the silent failure point, dutifully accepting every message without raising a single alarm.
Why It Happens
Backpressure is upstream-visible consumer slowness. When your consumer falls behind, what happens upstream? Without backpressure, the answer is "nothing." The producer keeps producing, the queue keeps growing, and the system looks healthy right up until it OOMs.
Teams end up here for a few individually-reasonable reasons:
"We'll scale the consumers before the queue gets too big." Manual scaling doesn't work at 3 a.m. when traffic spikes. By the time your on-call engineer wakes up, the queue has 4 million messages and the process is dead. Auto-scaling helps, but it takes 1-3 minutes to spin up new consumers. Without a bounded queue, the process may OOM before scaling completes.
"In-memory queues are faster than bounded ones." True, but an unbounded in-memory queue is a delayed crash. You're trading latency now for an OOM later. The queue absorbs all excess work invisibly, hiding the rate mismatch until it's catastrophic.
"Our consumers are fast enough." They are, until they aren't. A downstream dependency slows down, a database connection pool fills up, or traffic doubles on a holiday. I've watched teams run for months with perfect producer-consumer balance, then lose an entire service on the first real traffic spike. Backpressure is insurance for the moment "fast enough" stops being true.
"We haven't had a problem yet." Survivorship bias. The absence of an OOM doesn't mean the queue is bounded. It means you haven't had a large enough traffic spike yet. Check your queue's maximum observed depth. If it's been growing month over month, you're trending toward a crash.
"The queue library doesn't have a size limit option." Many in-memory queue implementations default to unbounded. If you never set a max size, you never get backpressure. The default is the anti-pattern.
Real-world examples
This pattern shows up everywhere queues exist:
- HTTP request queues in web servers. Tomcat, Netty, and Node.js all have configurable request queue sizes. The default is often unbounded or very large. During a traffic spike, the server accepts thousands of connections it can't process, eats memory, and eventually crashes.
- Kafka consumer applications. The consumer reads messages into an in-memory buffer for processing. If the processing pipeline is slower than the read rate, the buffer grows until OOM. Kafka's
max.poll.recordsandpause()/resume()are the backpressure mechanisms. - Thread pool task queues. A
ThreadPoolExecutorwith an unboundedLinkedBlockingQueuewill accept infinite tasks. When the pool is saturated, tasks pile up in memory. Use a bounded queue with a rejection policy (CallerRunsPolicyorAbortPolicy). - gRPC streaming. A server-streaming RPC with a fast producer and slow consumer fills gRPC's internal buffers. Without flow control, the server allocates memory for every buffered message.
The common thread: every system that moves data between components at different speeds needs a bounded buffer and a strategy for when that buffer fills. If you grep your codebase for "new Queue()" or "new LinkedList()" used as a work buffer, check whether there's a size limit. If there isn't, you've found the anti-pattern.
How to Detect It
| Symptom | What It Means | How to Check |
|---|---|---|
| Queue depth growing over time | Consumer can't keep up with producer | Monitor Kafka consumer lag or queue size metrics |
| Memory climbing monotonically | Unbounded in-memory queue accumulating | Track process heap usage (Grafana/Prometheus) |
| Latency increasing on the write path | Producer blocking on full bounded queue | Measure p99 enqueue latency |
| OOM crashes after sustained traffic | Queue was unbounded, hit heap limit | Check crash logs for OutOfMemoryError after spikes |
| Consumer processing rate < producer rate | Persistent rate mismatch, not just a burst | Compare messages.produced/sec vs messages.consumed/sec |
| Periodic restart loops | Service crashes, restarts, re-reads backlog, crashes again | Check restart count and uptime metrics |
In Kafka, the key metric is consumer lag: the number of messages produced but not yet consumed. Alert when consumer lag increases over a 5-minute window. I've found that a lag-based alert catches problems 10-15 minutes before an OOM does.
Monitoring queries to set up now
# Prometheus: alert when consumer lag is increasing
rate(kafka_consumer_lag[5m]) > 0
# Prometheus: alert when process memory exceeds 80% of limit
process_resident_memory_bytes / container_memory_limit_bytes > 0.8
# Grafana: dashboard panel for queue depth over time
kafka_consumer_lag{consumer_group="notification-service"}
If you're using an in-memory queue without Kafka, instrument it yourself. Export a gauge metric for queue.size and alert when it exceeds 50% of your heap budget. The earlier you catch the growth, the more options you have.
The Fix
The three responses to backpressure are: slow down the producer, shed excess load, or scale consumers. In practice, most production systems combine two or three of these.
The important insight is that all three responses require a bounded queue. Without a bound, the system never triggers any response. The queue absorbs everything, the producer never slows down, the load is never shed, and auto-scaling is never triggered. Bounding the queue is the prerequisite that makes every other response possible.
Fix 1: Bounded queues with blocking
Set a maximum queue size. When the queue is full, the producer blocks until space is available. This propagates pressure upstream, which is exactly what you want when the producer is within your control.
In Kafka, this is equivalent to using pause() on the consumer when your processing buffer is full, and resume() when space is available. The consumer stops fetching new messages until it can handle them. For HTTP-based producers, return 503 Service Unavailable with a Retry-After header.
class BoundedQueue<T> {
private queue: T[] = [];
private maxSize: number;
constructor(maxSize: number) {
this.maxSize = maxSize;
}
async enqueue(item: T): Promise<void> {
while (this.queue.length >= this.maxSize) {
await sleep(10); // backpressure: wait until space available
}
this.queue.push(item);
}
dequeue(): T | undefined {
return this.queue.shift();
}
}
Trade-off: Blocking the producer increases upstream latency. If your producer is an API server, users see slower responses. This is acceptable when "slow" is better than "crash."
Fix 2: Load shedding (drop or reject)
When the queue is full, reject the incoming item with a 429 Too Many Requests or drop it deterministically (e.g., drop oldest). This is the right response when you can't slow the producer down (external clients, third-party webhooks).
async function enqueue(item: Event): Promise<void> {
if (queue.size >= MAX_QUEUE_SIZE) {
metrics.increment('events.dropped');
return; // shed load: drop the event
}
queue.push(item);
}
Trade-off: You lose data. For notification emails, losing a few during a spike is acceptable. For financial transactions, it's not. Choose shedding only when losing some messages is safer than losing all of them.
When shedding, always emit a metric (events.dropped) so you know it's happening. Silent shedding is better than silent queuing, but visible shedding is best.
Fix 3: Auto-scale consumers
Monitor queue depth and auto-scale consumer instances when lag exceeds a threshold. This is the elastic response, effective but slow (30-120 seconds for new containers). Always combine with bounded queues so the system survives the scaling delay.
The auto-scaling loop: consumer lag exceeds threshold, orchestrator spins up new consumer pods, new pods join the consumer group, Kafka rebalances partitions across consumers, throughput increases, lag decreases. The entire cycle takes 1-3 minutes. During that window, the bounded queue and load shedding policy keep the system alive.
// Auto-scaling rule (Kubernetes HPA config concept)
const scalingConfig = {
metric: 'kafka_consumer_lag',
threshold: 10_000, // scale when lag > 10K messages
scaleUpCooldown: '60s',
maxReplicas: 20,
};
Trade-off: Infrastructure cost. You pay for the headroom to absorb spikes. But it's cheaper than an OOM crash during peak traffic.
The bottom line: bounded queues are non-negotiable. The question is only what happens when they fill up. Blocking, shedding, or scaling are all valid. An unbounded queue that "just grows" is never valid in a long-running production service.
Which fix to use?
Severity and Blast Radius
No backpressure is a high severity anti-pattern. When it triggers, there is no graceful degradation. It crashes.
The failure mode is binary: the system either has enough memory for the queue, or it doesn't. There's no slow decline, no warning period where things get gradually worse. Memory usage climbs linearly until it hits the limit, then the process dies instantly. This makes it particularly dangerous because traditional alerting (latency thresholds, error rate thresholds) doesn't fire until the crash has already happened.
The blast radius depends on what the queue feeds. If the overloaded queue is in your notification service, you lose emails for a window. If it's in your payment processing pipeline, you lose transactions. If it's in your API gateway's request queue, the entire service goes down.
Recovery is fast once you add bounds (minutes to deploy a config change), but the OOM crash itself can cascade. If the crashed service holds database connections, those connections don't release cleanly. The database hits its connection limit, and now other services can't connect either. I've seen a single unbounded queue take down three services in a chain this way.
| Impact | Without backpressure | With backpressure |
|---|---|---|
| Queue growth | Unbounded, silent | Bounded, visible |
| Failure mode | OOM crash (binary) | Slow responses or 429s (graceful) |
| Recovery time | Minutes to hours (restart loops) | Seconds (queue drains) |
| Blast radius | Can cascade to dependent services | Contained to the overloaded path |
| Data loss risk | High (crash loses in-flight messages) | Low (messages rejected, not lost) |
When It's Actually OK
- Batch processing with finite input. If you're reading from a file with a known record count, an unbounded in-memory queue won't OOM because the input itself is bounded. The queue will drain.
- Development and testing. Unbounded queues simplify local development. You don't want backpressure slowing your test suite. Just don't let the dev config leak into production.
- Short-lived processes. A CLI tool that processes a CSV and exits doesn't need queue bounds. The process lifetime is the bound. If the input is finite and small relative to memory, the queue drains before it becomes a problem.
- Consumer is provably faster. If your consumer processes faster than any producer can emit (hardware-limited, not software-limited), backpressure adds complexity without value. But "provably" is doing a lot of work in that sentence. Network hiccups, GC pauses, and dependency slowdowns can all temporarily slow a "provably fast" consumer.
The key question to ask: "What happens to this queue if the consumer stops for 5 minutes?" If the answer is "the process runs out of memory," you need backpressure. If the answer is "the queue drains when the consumer resumes and the total data fits in memory," you might be OK without it.
How This Shows Up in Interviews
Any time your design includes a queue feeding a worker pool, the interviewer may ask: "What happens when the worker pool falls behind?" The wrong answer is "we'd scale up." The right answer addresses the mechanism.
A strong answer sounds like this: "The queue would fill up, but I'd set a bounded queue size with load shedding and a consumer lag alert. If consumer lag exceeds X minutes, auto-scale the consumer pool. If the queue is full before auto-scaling kicks in, we return 429 to the producer so upstream callers know to back off."
This shows three things interviewers look for: you've thought about steady-state vs burst conditions, you have a concrete response strategy (not just "scale up"), and you understand that queues need explicit bounds.
If the interviewer pushes further with "What if you can't drop messages?", describe the blocking backpressure approach: the producer waits for queue space, upstream latency increases, but no data is lost. Then mention the trade-off: users see slower responses during the spike, but the system stays alive.
Always bound your queues
An unbounded queue is a ticking time bomb. It masks production rate vs consumption rate mismatches until the process crashes. Set an explicit bound and decide what to do when it's hit: block, drop, or scale. Any of the three is better than ignoring the problem.
Quick Recap
- Without backpressure, a fast producer fills an unbounded queue until the process OOMs.
- The crash is delayed but certain. The queue grows invisibly until it hits the heap limit at the worst possible time.
- Teams end up here because the defaults are unbounded and the system looks healthy until it crashes.
- The three responses to consumer lag: slow down the producer (block), shed load (drop), or scale the consumer (add workers). Combine two or three.
- Bounded queues make backpressure visible: when the queue is full, you have to make a decision. Unbounded queues defer that decision until a crash.
- Kafka consumer lag is the observability signal: alert when lag is increasing, auto-scale when it exceeds a time threshold.
- Real-world examples include HTTP request queues, Kafka consumer buffers, thread pool task queues, and gRPC streaming buffers.
- The most dangerous variant is silent swallowing: a queue that grows without any metric alerting on it.