Discord: Go to Rust migration
How Discord rewrote a critical service from Go to Rust to eliminate GC pauses, achieving consistent sub-millisecond p99 latency and lower memory usage.
TL;DR
- Discord rewrote its Read States service from Go to Rust in 2020 to eliminate garbage collector pauses that spiked p99 latency every two minutes.
- The Go service showed p99 spikes to ~10ms on a predictable two-minute cycle, correlating exactly with Go's stop-the-world GC phases.
- The LRU cache at the core of the service processed ~50 billion entry evictions per day, creating massive allocation churn that overwhelmed GC tuning.
- After the Rust rewrite, p99 latency dropped to a consistent ~500Β΅s with zero GC-related spikes.
- Memory usage fell ~30% because Rust structs carry no GC headers and the allocator has no runtime scanning overhead.
- The service handles billions of read-state updates daily across ~250M registered users and ~8.5M concurrent users (2020 numbers).
- Transferable lesson: for latency-sensitive services with high allocation churn, garbage collection is a real tail-latency risk, and the fix is not always "tune the GC."
The Trigger
By mid-2020, Discord's Read States service had a latency graph that looked like a heartbeat monitor. Every two minutes, p99 latency spiked from ~500Β΅s to ~10ms, then dropped back down. The pattern was metronomic. Users did not see raw latency numbers, but they felt the intermittent sluggishness when switching servers and loading channel lists.
The Read States service tracks which messages each user has read across every channel and guild. At Discord's scale (roughly 250 million registered users, 8.5 million concurrent daily), this means millions of cache entries churning constantly as users open channels, receive messages, and switch servers. The service maintained an in-memory LRU cache with approximately 50 billion entry evictions per day.
Engineers on the infrastructure team profiled the service and found the spikes correlated exactly with Go's garbage collector cycles. Every time the GC kicked in to scan and reclaim evicted cache entries, it introduced stop-the-world pauses that pushed tail latency to unacceptable levels. I've seen similar patterns in Java services with large heaps, but Discord's case was particularly stark because the spike was so regular and so directly attributable to GC.
Why tail latency matters here
A 10ms p99 spike might sound small, but Discord's client aggregates multiple Read States calls per screen load. If a user belongs to 50 guilds, the client fans out requests, and the slowest response determines perceived load time. A 10ms p99 on one call becomes a near-certainty of hitting at least one slow response across 50 parallel calls.
The team had already tried tuning Go's GC. They adjusted GOGC, experimented with memory ballast techniques, and restructured data to reduce pointer density. None of it eliminated the fundamental problem: Go's GC must scan live objects, and a high-churn LRU cache generates enormous volumes of short-lived allocations that the GC must process.
The scale of the problem
To understand why this mattered, consider the math. A p99 of 10ms means that 1 in 100 requests hits a spike. If a single user session triggers 50 parallel Read States queries (one per guild the user belongs to), the probability of at least one slow response is 1 - (0.99)^50 = 39.5%. Nearly four in ten page loads felt slow.
For power users in 100+ guilds, the probability climbed above 63%. These were Discord's most engaged users, the moderators and community builders who drive platform stickiness. Making their experience worse was not acceptable.
Why this was not a temporary problem
The LRU churn rate was proportional to daily active users. As Discord grew (2020 was a breakout year, partly driven by pandemic-era remote communication), the cache churn rate grew with it. The GC pressure was not a spike; it was a structural property of the workload that would only get worse over time.
The System Before
How the Go service worked
Each Go instance maintained a large in-memory LRU cache mapping (user_id, channel_id) pairs to read-state structs. When a user opened a channel, the service checked the cache first. On a cache hit (the common case), it returned the read position in microseconds. On a miss, it read from Cassandra and populated the cache.
The cache used a standard HashMap plus doubly-linked list implementation for O(1) lookups and O(1) evictions. Each entry was a Go struct with a channel ID, last-read message ID, mention count, and timestamps. The struct itself was small (roughly 80 bytes), but at millions of entries per instance, the aggregate heap was substantial.
// Simplified Go read-state struct
type ReadStateEntry struct {
ChannelID uint64
LastMessageID uint64
MentionCount uint32
LastPinTS uint64
}
// LRU eviction creates GC pressure:
// 1. New entry arrives β allocate ReadStateEntry on heap
// 2. LRU full β evict oldest entry β unlink from list
// 3. Evicted entry becomes garbage β GC must find and free it later
// 4. Millions of evictions/sec β massive GC scan workload
The request lifecycle
A typical request flow looked like this:
- User opens a channel in the Discord client.
- The client sends a read-state query via WebSocket to the gateway.
- The gateway routes the request to the correct Read States instance based on user ID shard.
- The service checks the LRU cache. On hit (>90% of requests), it returns immediately.
- On miss, it queries Cassandra, populates the cache, and returns.
- When the user reads messages, the client sends an update. The service writes to the cache and asynchronously persists to Cassandra.
This lifecycle generated constant cache churn. Every cache population is a heap allocation. Every eviction (when the cache reaches capacity) creates garbage. At peak traffic, each instance was allocating and deallocating hundreds of thousands of structs per second.
What worked
The architecture was sound. Sharding by user ID distributed load evenly. Cassandra provided durable storage with tunable consistency. The LRU cache delivered sub-millisecond reads for the hot path. Average latency was excellent.
What didn't work
The Go garbage collector scans the entire heap to identify live objects. With millions of LRU entries being created and evicted continuously, the scanner had enormous work to do. Each GC cycle took long enough to push p99 latency from ~500Β΅s to ~10ms.
The problem was structural, not configurational. Go's GC is one of the best in the managed-language world, but it operates on a fundamental constraint: it must trace reachable objects at runtime. A workload that creates and destroys millions of small objects per second is the worst case for any tracing GC.
I've worked with teams that spent months tuning JVM GC parameters for similar workloads. The result is usually a game of whack-a-mole: you reduce one pause type and another emerges. Discord's team arrived at the same conclusion for Go's collector.
Why Not Just Tune the Go GC?
The obvious first question: why not fix the GC instead of rewriting the service? Discord's team tried several approaches before concluding that a rewrite was necessary.
Approach 1: Adjust GOGC
Go's GOGC environment variable controls how aggressively the GC runs. A higher value (say, GOGC=800) lets the heap grow larger before triggering collection, reducing GC frequency but increasing memory usage and making each GC pause longer. A lower value triggers GC more often but with shorter pauses.
Discord tried both directions. Higher GOGC reduced spike frequency but made each spike worse (longer pauses on a larger heap). Lower GOGC made spikes more frequent. Neither eliminated the fundamental problem.
The default GOGC=100 means the GC triggers when allocations since the last collection equal the live heap size. For a service with a 4GB live heap (millions of cached entries), this means the GC runs after roughly 4GB of new allocations, which at Discord's allocation rate happened every two minutes. Doubling GOGC to 200 delayed GC to every four minutes, but each pause took twice as long because the heap to scan was twice as large.
Approach 2: Memory ballast
A common Go optimization is to allocate a large, never-freed byte slice (the "ballast") to inflate the heap baseline. This tricks the GC into running less often because the heap-to-live ratio stays lower. Discord experimented with ballast sizes from 1GB to 8GB.
The ballast helped marginally. GC frequency dropped, but when collection did run, it still had to scan the millions of live LRU entries. The spike magnitude was unchanged. The ballast technique works well for services with bursty allocation patterns; it does not help when the allocation pressure is constant and structural.
Approach 3: Reduce pointer density
Go's GC traces pointers. Structs with fewer pointer fields mean less work per GC cycle. Discord restructured some data to use value types instead of pointer types where possible. This reduced GC scan time but did not eliminate it.
The 'just tune GC' trap
When profiling reveals GC as the bottleneck, the instinct is to tune GC parameters. This works when GC pressure comes from temporary spikes or suboptimal allocation patterns. It does not work when the workload fundamentally conflicts with GC's operating model: millions of small, short-lived allocations per second with strict tail-latency requirements.
Approach 4: Off-heap storage
One alternative was to move the LRU cache off-heap using mmap or cgo to allocate memory outside Go's managed heap. This would have hidden the data from the GC scanner. However, it would also have sacrificed Go's safety guarantees, required manual memory management through unsafe FFI, and introduced its own complexity.
At this point, the team concluded: if we are going to manage memory manually to avoid GC, we should use a language designed for manual memory management from the ground up, one with compile-time safety instead of runtime scanning.
The Decision
Discord chose Rust. The decision was not "Rust is trendy" but "Rust solves the specific problem we have, with safety guarantees we need."
Why Rust over C/C++
C and C++ also lack garbage collectors. But they require the programmer to manually allocate and free memory, with no compile-time enforcement that freed memory is not accessed later (use-after-free) or that allocated memory is ever freed (leaks). In a service handling billions of requests daily, memory safety bugs translate directly to security vulnerabilities and production outages.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.