Kafka internals: log segments, ISR, and consumer groups

The problem

Your team ships a microservices architecture. Service A produces order events. Services B, C, and D all need those events. You use a traditional message queue with competing consumers, so each message is delivered to exactly one consumer. But B, C, and D all need every event. You create three queues and fan out the message to all three.

Now multiply that by 40 services. The fan-out configuration becomes a maintenance disaster. Worse, Service D falls behind for two hours during a deploy, and by the time it recovers, the queue has already purged those messages. D has lost data permanently because the queue deletes messages after delivery.

Traditional message queues treat consumption as destructive: once a consumer reads a message, it is gone (or at least invisible). They conflate "delivery" with "deletion." If any consumer needs to re-read historical data, replay events, or process at a different pace, the model breaks. This is the problem Kafka's log-based architecture solves.

What it is

Apache Kafka is a distributed commit log: an append-only, ordered sequence of records that persists data to disk and allows multiple consumers to read independently at their own pace without destroying the data.

Think of it as a shared notebook in a library. Anyone can write new entries at the end, but nobody tears out pages. Every reader has their own bookmark. Some readers are on page 50, others on page 5,000. They all read the same notebook without interfering with each other.

A topic is logical. Each topic has N partitions. Each partition is a separate log, stored on a separate disk segment on the broker that owns it.

How it works

The core write and read paths in Kafka flow through the partition leader, segment files, and consumer offsets.

The write path works as follows:

The producer serializes the record and hashes the key (or round-robins if no key) to select a partition.
The producer sends the record to the partition leader broker.
The leader appends the record to the active log segment file on disk.
Follower replicas pull the record from the leader and append it to their own segment files.
Once all in-sync replicas (ISR) acknowledge, the leader considers the record committed.
The leader responds to the producer with the assigned offset.

The read path is pull-based. Consumers issue fetch requests to the leader, specifying the offset to read from. The leader reads from the segment file (often served from the OS page cache) and returns a batch of records. The consumer advances its offset.

Kafka's read path exploits zero-copy I/O (sendfile() on Linux). Instead of copying data from the page cache into the Kafka JVM heap and then into the network socket buffer, sendfile() transfers data directly from the OS page cache to the network interface. This eliminates two memory copies per fetch and is a major reason Kafka sustains gigabytes/second of consumer throughput.

Interview tip: Kafka is pull, not push

When asked "how do consumers get data from Kafka," state that consumers pull data by polling the broker. This is a deliberate design choice: pull-based consumption lets each consumer control its own read rate and avoids overwhelming slow consumers. Traditional push-based queues force the consumer to keep up with the broker's pace.

Log segments and indexes

Each partition is stored as a series of log segment files on disk. A segment is a pair of files:

.log file: the actual record data (sequential binary, append-only)
.index file: a sparse index mapping offset to byte position in the .log file

When a consumer seeks to offset 1500, Kafka:

Finds the segment whose base offset is at most 1500 (segment starting at offset 1000).
Searches the sparse .index file for the largest entry no greater than offset 1500, getting a byte position.
Scans forward from that byte position in the .log file until it reaches offset 1500.

This lookup is sub-millisecond for well-sized segments. The sparse index keeps only every Nth entry (configurable via index.interval.bytes, default 4 KB), so index files stay small.

Segment lifecycle:

New records always append to the active segment.
When the active segment hits the size limit (log.segment.bytes, default 1 GB) or the age limit (log.roll.ms), Kafka seals it and creates a new active segment.
Sealed segments are eligible for deletion based on retention.ms (default 7 days) or retention.bytes.
Kafka deletes entire segments at once, never individual records. A segment is only deleted when ALL records in it are past the retention threshold.

Parameter	Default	Effect
`log.segment.bytes`	1 GB	Segment file size before rolling
`log.roll.ms`	7 days	Max age of active segment before rolling
`retention.ms`	7 days	How long to keep sealed segments
`retention.bytes`	-1 (unlimited)	Max total size per partition
`index.interval.bytes`	4096	Bytes between sparse index entries

Small segments → too many file handles

Setting log.segment.bytes too low (e.g., 10 MB) creates thousands of tiny segment files per partition. Each open segment consumes a file descriptor. A broker with 1,000 partitions and small segments can exhaust the OS file descriptor limit, causing broker crashes. Keep segment size at 256 MB or above in production.

ISR replication protocol

Partitions are replicated across brokers for fault tolerance. Each partition has one leader and N-1 followers. Producers write only to the leader. Followers continuously pull (fetch) from the leader and append to their own logs.

In-Sync Replicas (ISR): The set of replicas that are "caught up" with the leader, within replica.lag.time.max.ms (default 30 seconds) of the leader's latest offset.

The problem

What it is

A topic is logical. Each topic has N partitions. Each partition is a separate log, stored on a separate disk segment on the broker that owns it.

How it works

The core write and read paths in Kafka flow through the partition leader, segment files, and consumer offsets.

The write path works as follows:

The producer serializes the record and hashes the key (or round-robins if no key) to select a partition.
The producer sends the record to the partition leader broker.
The leader appends the record to the active log segment file on disk.
Follower replicas pull the record from the leader and append it to their own segment files.
Once all in-sync replicas (ISR) acknowledge, the leader considers the record committed.
The leader responds to the producer with the assigned offset.

Interview tip: Kafka is pull, not push

Log segments and indexes

Each partition is stored as a series of log segment files on disk. A segment is a pair of files:

.log file: the actual record data (sequential binary, append-only)
.index file: a sparse index mapping offset to byte position in the .log file

When a consumer seeks to offset 1500, Kafka:

Finds the segment whose base offset is at most 1500 (segment starting at offset 1000).
Searches the sparse .index file for the largest entry no greater than offset 1500, getting a byte position.
Scans forward from that byte position in the .log file until it reaches offset 1500.

This lookup is sub-millisecond for well-sized segments. The sparse index keeps only every Nth entry (configurable via index.interval.bytes, default 4 KB), so index files stay small.

Segment lifecycle:

New records always append to the active segment.
When the active segment hits the size limit (log.segment.bytes, default 1 GB) or the age limit (log.roll.ms), Kafka seals it and creates a new active segment.
Sealed segments are eligible for deletion based on retention.ms (default 7 days) or retention.bytes.
Kafka deletes entire segments at once, never individual records. A segment is only deleted when ALL records in it are past the retention threshold.

Parameter	Default	Effect
`log.segment.bytes`	1 GB	Segment file size before rolling
`log.roll.ms`	7 days	Max age of active segment before rolling
`retention.ms`	7 days	How long to keep sealed segments
`retention.bytes`	-1 (unlimited)	Max total size per partition
`index.interval.bytes`	4096	Bytes between sparse index entries

Small segments → too many file handles

ISR replication protocol

In-Sync Replicas (ISR): The set of replicas that are "caught up" with the leader, within replica.lag.time.max.ms (default 30 seconds) of the leader's latest offset.

Kafka internals: log segments, ISR, and consumer groups

The problem

What it is

How it works

Log segments and indexes

ISR replication protocol

Continue Reading with Premium

Comments

Kafka internals: log segments, ISR, and consumer groups

The problem

What it is

How it works

Log segments and indexes

ISR replication protocol

Continue Reading with Premium

Comments