Kafka vs. RabbitMQ

TL;DR

Dimension	Choose Kafka	Choose RabbitMQ
Throughput	Need 100K+ msgs/sec per broker with batched writes	20K-50K msgs/sec is sufficient, individual message latency matters more
Message replay	Must replay historical events (bug fixes, new consumers, auditing)	Messages are fire-and-forget once processed
Ordering	Need strict per-entity ordering (events for user X always in order)	Ordering is not critical, or single-consumer-per-queue is acceptable
Routing	Topic-based fan-out with independent consumer group offsets	Complex per-message routing rules (headers, patterns, priority, DLX)
Consumer model	Pull-based, consumers control pace, backpressure is natural	Push-based, broker manages delivery, prefetch tuning
Retention	Need days/weeks/indefinite retention for event sourcing or audit	Messages are transient, delete on ACK

Default answer: Use Kafka for event streaming and high-throughput log pipelines. Use RabbitMQ for task queues, request-reply patterns, and complex routing. They solve different problems.

Your order service publishes an "order created" event. Three downstream services need it: inventory, billing, and analytics. On Monday, the analytics team deploys a bug that silently drops half the events. They discover it Wednesday.

With RabbitMQ, those events are gone. The moment each message was acknowledged, the broker deleted it. The analytics team has no way to reprocess Tuesday's orders without re-publishing from the source.

With Kafka, the analytics team resets their consumer group offset to Monday at midnight and replays every event. The fix is deployed, reprocessing completes in an hour, and no other consumer is affected. This is the moment most teams realize the two tools solve fundamentally different problems.

RabbitMQ is a message broker. It routes individual messages from producers to consumers, and its job is done when the consumer acknowledges. Kafka is a distributed commit log. It persists an ordered, immutable stream of records, and any number of consumers can read from any position independently.

This distinction cascades into everything: ordering guarantees, throughput characteristics, consumer models, operational patterns, and failure recovery. I've seen teams waste months trying to make RabbitMQ behave like Kafka (or vice versa) because they picked the tool before understanding which problem they had.

How Each Works

Kafka: Distributed Commit Log

Kafka organizes data into topics, and each topic is split into partitions. Each partition is an append-only, ordered log of records stored on disk. Producers write to the end of a partition, and consumers read from any position by tracking an offset (a sequential number).

Consumer groups are the parallelism unit. Within a group, each partition is assigned to exactly one consumer. If you have 12 partitions and 4 consumers in a group, each consumer reads 3 partitions. Adding a fifth consumer rebalances the assignment. A separate consumer group reads the same data independently at its own pace.

# Producer: append to a topic with a partition key
producer.send(
    topic="orders",
    key="user_456",        # Same key = same partition = ordered
    value=serialize(order_event),
    headers={"event_type": "order.created"}
)

# Consumer: poll for records, commit offset after processing
while True:
    records = consumer.poll(timeout_ms=100)
    for record in records:
        process(record)
    consumer.commit()      # Mark offset as processed

Retention is time-based or size-based (default 7 days). Records stay on disk whether or not any consumer has read them. Log compaction keeps only the latest value per key, useful for CDC and materialized views.

Kafka uses in-sync replicas (ISR) for durability. Each partition has a leader and N-1 followers. Writes go to the leader, followers replicate asynchronously, and a write is considered committed when all ISR members acknowledge it. If a follower falls behind, it gets removed from the ISR until it catches up.

KRaft (Kafka Raft) replaces ZooKeeper for metadata management in Kafka 3.3+. The controller quorum handles broker registration, partition assignment, and leader election using Raft consensus. My recommendation for new clusters: always use KRaft. ZooKeeper is on its way out.

Exactly-once semantics require three components working together. Idempotent producers (enable.idempotence=true) guarantee that retried sends produce exactly one record per message (the broker deduplicates using producer ID and sequence number). Transactional writes wrap read-process-write operations in an atomic transaction: consumer reads from input topic, processor transforms, producer writes to output topic. If any step fails, the entire transaction aborts. The isolation.level=read_committed setting on downstream consumers ensures they only see records from committed transactions.

Consumer group rebalancing redistributes partitions when consumers join or leave. The cooperative sticky assignor (default in newer clients) minimizes partition movement during rebalancing. Processing pauses briefly on affected partitions during a rebalance. At scale, this pause causes consumer lag spikes. Setting session.timeout.ms=45000 and heartbeat.interval.ms=15000 balances between fast failure detection and avoiding unnecessary rebalances from brief network hiccups.

Tiered storage (KIP-405, available in Kafka 3.6+ and Confluent Platform) offloads old log segments to object storage (S3) while keeping recent segments on local disk. This dramatically reduces broker disk costs for topics with long retention (30+ days). Brokers store the "hot" data locally for low-latency reads, and transparently fetch "cold" data from S3 when consumers read historical offsets. For teams that want infinite retention without proportionally scaling disk, tiered storage is a game-changer.

Kafka's zero-copy optimization (sendfile system call) transfers data directly from the page cache to the network socket without copying through user space. This is why Kafka can sustain multi-GB/s read throughput per broker with minimal CPU usage. Consumers reading recent data (still in page cache) get near-memory-speed performance. Consumers reading data older than the page cache pay disk I/O cost.

# Key Kafka producer configuration
enable.idempotence: true          # Exactly-once per partition
acks: all                         # Wait for all ISR replicas
retries: 2147483647               # Infinite retries (idempotent)
max.in.flight.requests.per.connection: 5  # Safe with idempotence
compression.type: lz4             # Batch compression
linger.ms: 5                      # Batch for 5ms before sending
batch.size: 16384                 # 16 KB batches

RabbitMQ: AMQP Broker with Exchange Routing

RabbitMQ implements AMQP 0-9-1. Producers publish messages to exchanges, not queues directly. Exchanges route messages to queues based on bindings and routing keys. The exchange type determines the routing algorithm.

# Producer: publish to an exchange with a routing key
channel.basic_publish(
    exchange="order_events",
    routing_key="order.created",
    body=serialize(order_event),
    properties=pika.BasicProperties(
        delivery_mode=2,       # Persistent to disk
        content_type="application/json"
    )
)

# Consumer: subscribe to a queue, ACK after processing
def callback(ch, method, properties, body):
    process(body)
    ch.basic_ack(delivery_tag=method.delivery_tag)

channel.basic_consume(
    queue="billing_orders",
    on_message_callback=callback
)

Four exchange types handle different routing patterns:

Direct: exact match on routing key (e.g., order.created matches only order.created)
Topic: wildcard pattern matching (order.* matches order.created, #.created matches anything ending in .created)
Fanout: broadcast to all bound queues regardless of routing key
Headers: match on message header attributes instead of routing key

RabbitMQ pushes messages to consumers (the broker initiates delivery). Prefetch count controls how many unacknowledged messages a consumer can hold, providing built-in backpressure. Once a consumer ACKs a message, it is deleted from the queue.

Dead letter exchanges (DLX) capture messages that are rejected, expired, or exceed queue length. Priority queues reorder messages by priority level (1-255). Quorum queues (introduced in 3.8) provide Raft-based replication for high availability, replacing the older mirrored queue approach.

Message flow in RabbitMQ works like this: the producer sends a message via an AMQP channel to the exchange. The exchange evaluates bindings and routes the message to zero or more queues. Each queue stores the message in memory or on disk depending on persistence settings. The broker then pushes the message to a subscribed consumer based on the prefetch count.

RabbitMQ's connection model uses multiplexed channels over a single TCP connection. One connection can have hundreds of channels, each handling independent message streams. This is efficient for applications that publish and consume from many queues: you open one TCP connection and multiplex the traffic over lightweight channels. Kafka uses one TCP connection per broker, with the client library managing which partitions map to which broker connections internally.

Prefetch tuning is critical for throughput. basic.qos(prefetch_count=1) means the broker sends one message at a time, waiting for ACK before sending the next. This guarantees fair distribution but limits throughput. basic.qos(prefetch_count=100) sends up to 100 unacknowledged messages, keeping the consumer busy but risking message concentration if processing is slow. My default: start at 20-50 for background jobs, 1-5 for long-running tasks, and tune based on consumer processing time.

Quorum queues use Raft consensus to replicate messages across a configurable number of nodes (default 3). Writes require majority acknowledgment (2 of 3 nodes). This provides better data safety than mirrored queues (which could lose messages during network partitions) at the cost of slightly higher write latency. For new deployments, always use quorum queues over classic mirrored queues.

RabbitMQ management plugin provides an HTTP API and web UI for monitoring queue depths, consumer counts, message rates, and connection states. Key operational metrics to monitor: queue depth (messages ready for delivery), unacked count (messages delivered but not yet acknowledged), publish rate vs consume rate (if publish > consume, the queue grows), and memory/disk alarms (RabbitMQ blocks publishers when memory exceeds the watermark threshold, default 40% of system RAM).

RabbitMQ Streams (introduced in 3.9) add Kafka-like append-only log semantics to RabbitMQ. Streams support offset-based consumption, time-based offset seeking, and message replay. This narrows the gap between the two tools, but streams lack Kafka's partition model, consumer groups, and ecosystem maturity. I view RabbitMQ Streams as useful for "I mostly need RabbitMQ but sometimes need replay for one queue" rather than a replacement for Kafka's streaming architecture.

RabbitMQ's plugin ecosystem extends its capabilities. The Shovel plugin copies messages between brokers (useful for cross-datacenter replication). The Federation plugin loosely connects brokers across geographic regions with eventual consistency. The consistent hash exchange distributes messages across queues using consistent hashing, providing load-balanced consumption without application-side routing logic.

Publisher confirms are RabbitMQ's equivalent of Kafka's acks. With confirms enabled, the broker sends an acknowledgment to the producer after the message is written to disk (and replicated, for quorum queues). Without confirms, a crash between the network send and disk write loses the message. Always enable publisher confirms for persistent messages in production. The latency cost is 1-5ms per confirmed message.

RabbitMQ's lazy queues store messages to disk immediately instead of keeping them in memory first. For queues that build up large backlogs (consumers are slow or offline), lazy queues prevent memory exhaustion. The trade-off is higher per-message latency (disk write on every enqueue). Default (non-lazy) queues are faster for high-throughput, low-backlog scenarios where messages are consumed almost immediately.

Key Configuration Differences

TL;DR

Dimension	Choose Kafka	Choose RabbitMQ
Throughput	Need 100K+ msgs/sec per broker with batched writes	20K-50K msgs/sec is sufficient, individual message latency matters more
Message replay	Must replay historical events (bug fixes, new consumers, auditing)	Messages are fire-and-forget once processed
Ordering	Need strict per-entity ordering (events for user X always in order)	Ordering is not critical, or single-consumer-per-queue is acceptable
Routing	Topic-based fan-out with independent consumer group offsets	Complex per-message routing rules (headers, patterns, priority, DLX)
Consumer model	Pull-based, consumers control pace, backpressure is natural	Push-based, broker manages delivery, prefetch tuning
Retention	Need days/weeks/indefinite retention for event sourcing or audit	Messages are transient, delete on ACK

Default answer: Use Kafka for event streaming and high-throughput log pipelines. Use RabbitMQ for task queues, request-reply patterns, and complex routing. They solve different problems.

# Producer: append to a topic with a partition key
producer.send(
    topic="orders",
    key="user_456",        # Same key = same partition = ordered
    value=serialize(order_event),
    headers={"event_type": "order.created"}
)

# Consumer: poll for records, commit offset after processing
while True:
    records = consumer.poll(timeout_ms=100)
    for record in records:
        process(record)
    consumer.commit()      # Mark offset as processed

# Key Kafka producer configuration
enable.idempotence: true          # Exactly-once per partition
acks: all                         # Wait for all ISR replicas
retries: 2147483647               # Infinite retries (idempotent)
max.in.flight.requests.per.connection: 5  # Safe with idempotence
compression.type: lz4             # Batch compression
linger.ms: 5                      # Batch for 5ms before sending
batch.size: 16384                 # 16 KB batches

RabbitMQ: AMQP Broker with Exchange Routing

# Producer: publish to an exchange with a routing key
channel.basic_publish(
    exchange="order_events",
    routing_key="order.created",
    body=serialize(order_event),
    properties=pika.BasicProperties(
        delivery_mode=2,       # Persistent to disk
        content_type="application/json"
    )
)

# Consumer: subscribe to a queue, ACK after processing
def callback(ch, method, properties, body):
    process(body)
    ch.basic_ack(delivery_tag=method.delivery_tag)

channel.basic_consume(
    queue="billing_orders",
    on_message_callback=callback
)

Four exchange types handle different routing patterns:

Direct: exact match on routing key (e.g., order.created matches only order.created)
Topic: wildcard pattern matching (order.* matches order.created, #.created matches anything ending in .created)
Fanout: broadcast to all bound queues regardless of routing key
Headers: match on message header attributes instead of routing key

Kafka vs. RabbitMQ

TL;DR

The Framing

How Each Works

Kafka: Distributed Commit Log

RabbitMQ: AMQP Broker with Exchange Routing

Key Configuration Differences

Continue Reading with Premium

Comments

Kafka vs. RabbitMQ

TL;DR

The Framing

How Each Works

Kafka: Distributed Commit Log

RabbitMQ: AMQP Broker with Exchange Routing

Key Configuration Differences

Continue Reading with Premium

Comments