Competing consumers
How to scale message processing by running multiple consumers against the same queue. Covers partition assignment, consumer group rebalancing, at-least-once delivery, and idempotency requirements.
TL;DR
- Competing consumers scales message processing by running multiple workers against the same queue, where each message is processed by exactly one worker.
- In SQS, scaling is trivial: add more workers and the queue distributes messages automatically via visibility timeouts.
- In Kafka, maximum parallelism equals the partition count. You can't have more active consumers than partitions within a consumer group.
- All competing consumer systems deliver at-least-once, so workers must be idempotent: processing the same message twice produces the same result as processing it once.
- Consumer group rebalancing (Kafka) and visibility timeout tuning (SQS) are the two operational knobs that determine reliability and throughput.
The Problem
Your notification service sends emails when orders ship. During normal hours, one worker process handles the load fine. Then Black Friday hits. Order volume jumps 10x. Your single worker's queue depth climbs from 0 to 50,000 messages in minutes. Email delivery latency goes from seconds to hours.
You can't make the worker faster (it's limited by the SMTP server's throughput per connection). You need more workers. But if two workers pull the same message, the customer gets two shipping emails. If a worker crashes mid-processing, the message must not be lost.
The queue acts as a load balancer for messages. Each message goes to exactly one worker. Workers don't know about each other. A crashed worker's unacknowledged messages are re-delivered to healthy workers.
This is the competing consumers pattern: multiple workers "compete" for messages from a shared queue, and the queue guarantees each message is processed by exactly one consumer at a time.
The pattern solves three problems simultaneously:
- Throughput: N workers process N times more messages than a single worker.
- Resilience: a crashed worker's messages are re-delivered to surviving workers automatically.
- Elasticity: you can add or remove workers based on load without reconfiguring the queue.
But it introduces two new challenges that don't exist with a single consumer: duplicate delivery (a message may be processed more than once) and ordering gaps (messages may arrive out of order across workers). Everything in this article is about solving those two challenges at scale.
One-Line Definition
Competing consumers scales message processing horizontally by running multiple workers against a shared queue, where the queue coordinates message assignment so each message is processed by exactly one worker at a time.
Analogy
Think of a bank with multiple teller windows. Customers enter a single line (the queue). When a teller finishes serving one customer, the next customer in line walks up. No customer gets served by two tellers simultaneously. If a teller goes on break, customers keep flowing to the remaining windows. If the line gets long, the bank opens more windows.
The queue is the single line. The tellers are the workers. Opening more windows is adding more consumer instances. The bank doesn't need to redesign the line to add windows; scaling is just a capacity decision.
One important detail in this analogy: the bank doesn't let two tellers serve the same customer. The queue enforces exclusive delivery. And if a teller collapses mid-transaction, the customer returns to the front of the line and the next available teller takes over. That's the retry mechanism.
The analogy breaks at ordering: bank customers don't care if customer #5 finishes before customer #3. But in message processing, ordering sometimes matters. That's where partitions and message keys come in.
Solution Walkthrough
SQS Model: Simple Competing Consumers
With SQS, competing consumers are trivially simple. Multiple processes call ReceiveMessage. SQS handles distribution automatically.
When a worker receives a message, SQS hides it from other workers for the visibility timeout (default 30 seconds). The worker processes the message and calls DeleteMessage. If the worker crashes before deleting, the visibility timeout expires and another worker picks up the message.
Scaling: just add more workers. SQS handles load distribution automatically. No partition assignment concept, no rebalancing, no maximum parallelism limit beyond SQS's own throughput ceiling (nearly unlimited for standard queues).
Ordering: not guaranteed in standard queues. Messages can arrive out of order. FIFO queues preserve order within a MessageGroupId but are limited to 300 messages per second per group (3,000 with batching).
Visibility timeout tuning is the most important SQS configuration:
- Too short: slow processing causes the timeout to expire, and another worker picks up the message before the first one finishes. Result: duplicate processing.
- Too long: a crashed worker's messages stay hidden for the full timeout before being retried. Result: delayed processing on failure.
The rule of thumb: set visibility timeout to 2-3x your expected processing time. If processing takes 10 seconds, set the timeout to 30 seconds.
The SQS message lifecycle looks like this:
A message bouncing between Available and InFlight is either a poison pill or a sign that your visibility timeout is too short. Monitor the ApproximateNumberOfMessagesNotVisible metric: if it stays high while ApproximateNumberOfMessagesDeleted stays low, messages are being received but not successfully processed.
Kafka Model: Consumer Groups and Partitions
Kafka fundamentally changes the competing consumers model. Instead of a shared queue, messages live in ordered partitions within a topic.
The key constraint: each partition is assigned to exactly one consumer within a group at any time.
- 6 partitions, 3 consumers: each consumer handles 2 partitions
- 6 partitions, 6 consumers: each consumer handles 1 partition (maximum parallelism)
- 6 partitions, 8 consumers: 6 consumers are active, 2 sit idle wasting resources
Maximum parallelism equals the partition count. If you want 20 parallel workers, you need at least 20 partitions. You cannot add a 21st consumer and have it do useful work. Choose your partition count carefully at topic creation time; increasing it later changes message routing and can break ordering guarantees.
Kafka preserves order within a partition. If ordering matters for a key (all events for user X processed in order), route events for the same key to the same partition using the message key. All messages with key user-123 go to the same partition, which is assigned to exactly one consumer.
Multiple Consumer Groups
One of Kafka's most powerful features is that multiple consumer groups can independently consume the same topic. Each group maintains its own offset position and receives every message.
The email-sender group has 2 consumers processing notifications. The analytics group has 1 consumer building dashboards. The fraud-check group has 2 consumers running fraud models. All three groups read every order event independently, at their own pace. If the analytics consumer falls behind, it doesn't affect the email or fraud workers.
This is impossible with SQS's standard queue model. Once a consumer deletes a message from SQS, it's gone. To achieve the same multi-consumer pattern with SQS, you'd need SNS fan-out to multiple queues, one per consumer type. That works, but it's more operational overhead than Kafka's native multi-group model.
Message Ordering Considerations
The ordering question comes up in every interview discussion about competing consumers:
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.