How Redis works internally
Redis handles data at microsecond latency using a single-threaded event loop, carefully tuned data structures, and flexible persistence. Here is what is happening under the hood.
The Problem Statement
Interviewer: "We are considering putting Redis in front of our database to handle 100,000 requests per second. Redis is in-memory so it's fast, but can you walk me through how it actually processes that many concurrent requests? Does it use thread pools internally? And if the process restarts, do we lose all our data?"
This question separates candidates who know the Redis API (HSET, ZADD, EXPIRE, ZINCRBY) from those who understand the engine underneath it. The interviewer is testing three things simultaneously: your model of the event loop, your understanding of persistence tradeoffs, and whether you know what "in-memory" actually means for durability. Candidates who answer "Redis is fast because it's in-memory" are technically correct but miss the interesting parts.
The hidden rubric: the interviewer wants to know if you can reason about when Redis will surprise a team, how to configure it for a given durability requirement, and whether you understand the failure modes at scale.
Clarifying the Scenario
Before answering, I take 30 seconds to narrow the scope.
You: "A few quick questions. When you say 100,000 RPS, is that mostly reads or a mix? And are we talking a single Redis instance or a cluster setup?"
Interviewer: "Mostly reads, maybe 80/20 read/write. Single instance for now, potentially cluster later."
You: "And the durability requirement: if the Redis process crashes, can we reconstruct the cache from the database behind it, or does Redis need to be the authoritative source?"
Interviewer: "Redis is a cache in front of the database. We can reconstruct, but we would prefer to minimize warm-up time after a restart."
You: "Perfect. I will cover three things: first, how the single-threaded event loop handles 100K RPS without parallelism, which surprises most people. Second, how the internal data structure encodings work and why they matter for capacity planning. Third, how persistence modes work given your cache recovery requirement."
That structure directly reflects the two concerns the interviewer expressed (throughput and durability) while signaling that there is depth coming.
My Approach
I think about Redis internals in four layers, each of which explains different failure modes and configuration decisions:
- The event loop: One thread, non-blocking I/O multiplexing via epoll on Linux, thousands of concurrent connections with zero thread-switching overhead.
- The data structure engine: High-level types (String, Hash, List, Set, Sorted Set) backed by compact internal encodings that switch automatically based on collection size.
- Persistence: RDB snapshots via fork and copy-on-write (compact binary), AOF logs with configurable fsync, and hybrid mode for fast restarts with recent-data durability.
- High availability: Sentinel for automatic failover on standalone setups, Redis Cluster for horizontal sharding using 16,384 hash slots with a gossip protocol.
Understanding each layer explains the operational surprises that catch teams: why KEYS * in production causes a cascade of client timeouts, why memory usage jumps sharply at specific collection sizes, why RDB plus AOF together is safer than either alone, and why Redis Cluster requires careful key design for multi-key operations.
The Architecture
Here is the full picture of a Redis request from TCP connection arrival to response delivery:
The architecture looks deceptively simple: one thread, one event loop, one key space. That simplicity is the point. The absence of locking means Redis can execute millions of small commands per second without the synchronization overhead that dominates multi-threaded databases.
Since Redis 6.0, threaded I/O handles socket reads and writes in parallel with the main thread. But command execution stays single-threaded. The threading model is sometimes called "threaded I/O" to distinguish it from a fully parallel architecture. For most workloads dominated by small commands with high concurrency, the I/O threads provide a meaningful throughput improvement without changing the atomicity guarantees.
Walking through the diagram: a client sends a RESP-encoded command. The epoll loop wakes when that socket becomes readable. The RESP parser assembles the command tokens. The command dispatcher looks up a hash map of command names to handler functions, executes the handler against the in-memory data structure, and pushes the result into a reply buffer. The buffer flushes back to the socket. The entire roundtrip is measured in microseconds.
Pipelining and batching
RESP supports pipelining: a client can send multiple commands without waiting for individual responses. The event loop reads all buffered commands from the socket, executes each one, and returns all responses in a single write. Pipelining reduces round trips (the dominant latency source for network-bound workloads) from N to 1 for a batch of N commands. I use pipelining any time I need to do 10+ operations in a request handler, for example warming a batch of cache keys or updating multiple counters.
Transactions (MULTI/EXEC) queue commands and execute them atomically. No other client's commands can interleave between MULTI and EXEC. However, Redis transactions do not support conditional logic within the transaction itself: you cannot read a value and branch based on it. For conditional operations, use a Lua script (EVAL), which executes atomically and can contain arbitrary logic. Lua scripts run in the event loop like any other command.
Why single-threaded beats multi-threaded for small commands
For commands that complete in microseconds, the overhead of thread synchronization (mutex acquire, cache-line invalidation, context switch) exceeds the cost of the command itself. Redis eliminates all of that. The tradeoff is that one slow command blocks every other client for its full duration.
Replication and Sentinel
Redis replication is asynchronous primary-replica. The PSYNC protocol handles both full and partial synchronization:
Replica connects and sends:
PSYNC <replication-id> <offset>
Primary responds with one of:
FULLRESYNC <replication-id> <offset> --> send full RDB snapshot
CONTINUE --> send incremental from offset
The replication-id (repl ID) is a 40-character random string that uniquely identifies a primary's data history. When a replica promotes to primary during a failover, it generates a new repl ID and records the old one as repl_id2. Replicas trying to reconnect after failover can still do a partial resync if their offset is within the new primary's knowledge of the old primary's history.
The replication backlog is a fixed-size ring buffer (default 1MB) of recent write commands. A replica that reconnects within the backlog window sends its offset and receives only the delta. A replica that was offline too long (longer than backlog_size / write_rate seconds) triggers a full resync: the primary forks a BGSAVE and sends the entire RDB plus a replication stream.
Redis 4.0 introduced secondary replication IDs (repl_id2). When a replica is promoted to primary during a Sentinel failover, it saves the old primary's repl ID as repl_id2 and records the offset at which it became primary as repl_id2_offset. Other replicas reconnecting to the new primary can do a partial resync if their offset is below repl_id2_offset (they were within the old primary's backlog) and they identify themselves with repl_id2. Without this, every failover required a full resync of all remaining replicas, which meant sending a 10GB+ snapshot to each replica after every failover event.
Sentinel runs as a separate process outside of Redis and monitors primary-replica clusters. When the primary becomes unreachable:
- Each Sentinel independently declares it SDOWN (subjectively down) after
down-after-millisecondsof failed pings. - Once a quorum of Sentinels agree the primary is ODOWN (objectively down), one Sentinel is elected leader.
- The leader selects the most up-to-date replica (highest replication offset) and sends it a
SLAVEOF NO ONEcommand. - The other replicas are reconfigured to follow the new primary.
- Clients that subscribe to Sentinel receive a notification and update their connection target.
The minimum recommended Sentinel topology is three Sentinels on separate physical hosts with a quorum of two. This prevents a single Sentinel process failure from blocking failover (no quorum) or a network partition from triggering false failovers on both sides of the partition simultaneously.
Sentinel does not shard data. It provides HA for a single primary with up to N replicas. If you need horizontal scaling beyond what one primary can handle, you need Redis Cluster.
When to use Sentinel vs Cluster
Use Sentinel when a single primary handles your write throughput and you want automatic failover. Use Cluster when you need to horizontally scale writes or your dataset exceeds a single node's memory. You can run replicas within Cluster too: each Cluster primary can have one or more replicas for HA, managed by Cluster itself rather than Sentinel.
Single-Threaded Event Loop vs Multi-Threaded Databases
Traditional multi-threaded databases (MySQL, PostgreSQL) handle each connection with a dedicated thread or thread pool entry. When two threads need the same data, one acquires a mutex and the other blocks, then the OS schedules a context switch. For queries that take hundreds of milliseconds, this overhead is negligible. For Redis commands that take microseconds, synchronization cost is not.
A mutex acquire/release takes 20-100 nanoseconds. Cache-line invalidation across CPU cores adds more. If a thread actually yields (context switch), that costs thousands of nanoseconds. A Redis GET command that completes in 200 nanoseconds cannot afford any of this.
Redis eliminates all of it by running one thread. There is no other thread to conflict with. Commands are always atomic without any explicit locking.
The ae.c event loop at its core looks like this:
// Simplified from ae.c in the Redis source
while (!stop) {
// Block here until at least one socket has events (readable/writable)
numEvents = aeApiPoll(eventLoop, timeout); // epoll_wait on Linux
for (int i = 0; i < numEvents; i++) {
aeFileEvent *fe = &eventLoop->events[eventLoop->fired[i].fd];
if (fe->mask & AE_READABLE) {
fe->rfileProc(eventLoop, fd, fe->clientData, mask);
// reads RESP bytes from socket, builds command array
}
if (fe->mask & AE_WRITABLE) {
fe->wfileProc(eventLoop, fd, fe->clientData, mask);
// flushes reply buffer to socket
}
}
// Run time events (expire, serverCron, etc.)
processTimeEvents(eventLoop);
}
Each file event handler for a readable client socket reads the incoming RESP bytes, calls processInputBuffer, which calls processCommand, which dispatches to the appropriate command implementation (e.g., getCommand, hsetCommand). The reply is appended to the client's output buffer. The output buffer is flushed in a subsequent writable event or immediately if small.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.