Discord: Go to Rust migration
How Discord rewrote a critical service from Go to Rust to eliminate GC pauses, achieving consistent sub-millisecond p99 latency and lower memory usage.
TL;DR
- Discord rewrote its Read States service from Go to Rust in 2020 to eliminate garbage collector pauses that spiked p99 latency every two minutes.
- The Go service showed p99 spikes to ~10ms on a predictable two-minute cycle, correlating exactly with Go's stop-the-world GC phases.
- The LRU cache at the core of the service processed ~50 billion entry evictions per day, creating massive allocation churn that overwhelmed GC tuning.
- After the Rust rewrite, p99 latency dropped to a consistent ~500µs with zero GC-related spikes.
- Memory usage fell ~30% because Rust structs carry no GC headers and the allocator has no runtime scanning overhead.
- The service handles billions of read-state updates daily across ~250M registered users and ~8.5M concurrent users (2020 numbers).
- Transferable lesson: for latency-sensitive services with high allocation churn, garbage collection is a real tail-latency risk, and the fix is not always "tune the GC."
The Trigger
By mid-2020, Discord's Read States service had a latency graph that looked like a heartbeat monitor. Every two minutes, p99 latency spiked from ~500µs to ~10ms, then dropped back down. The pattern was metronomic. Users did not see raw latency numbers, but they felt the intermittent sluggishness when switching servers and loading channel lists.
The Read States service tracks which messages each user has read across every channel and guild. At Discord's scale (roughly 250 million registered users, 8.5 million concurrent daily), this means millions of cache entries churning constantly as users open channels, receive messages, and switch servers. The service maintained an in-memory LRU cache with approximately 50 billion entry evictions per day.
Engineers on the infrastructure team profiled the service and found the spikes correlated exactly with Go's garbage collector cycles. Every time the GC kicked in to scan and reclaim evicted cache entries, it introduced stop-the-world pauses that pushed tail latency to unacceptable levels. I've seen similar patterns in Java services with large heaps, but Discord's case was particularly stark because the spike was so regular and so directly attributable to GC.
Why tail latency matters here
A 10ms p99 spike might sound small, but Discord's client aggregates multiple Read States calls per screen load. If a user belongs to 50 guilds, the client fans out requests, and the slowest response determines perceived load time. A 10ms p99 on one call becomes a near-certainty of hitting at least one slow response across 50 parallel calls.
The team had already tried tuning Go's GC. They adjusted GOGC, experimented with memory ballast techniques, and restructured data to reduce pointer density. None of it eliminated the fundamental problem: Go's GC must scan live objects, and a high-churn LRU cache generates enormous volumes of short-lived allocations that the GC must process.
The scale of the problem
To understand why this mattered, consider the math. A p99 of 10ms means that 1 in 100 requests hits a spike. If a single user session triggers 50 parallel Read States queries (one per guild the user belongs to), the probability of at least one slow response is 1 - (0.99)^50 = 39.5%. Nearly four in ten page loads felt slow.
For power users in 100+ guilds, the probability climbed above 63%. These were Discord's most engaged users, the moderators and community builders who drive platform stickiness. Making their experience worse was not acceptable.
Why this was not a temporary problem
The LRU churn rate was proportional to daily active users. As Discord grew (2020 was a breakout year, partly driven by pandemic-era remote communication), the cache churn rate grew with it. The GC pressure was not a spike; it was a structural property of the workload that would only get worse over time.
The System Before
How the Go service worked
Each Go instance maintained a large in-memory LRU cache mapping (user_id, channel_id) pairs to read-state structs. When a user opened a channel, the service checked the cache first. On a cache hit (the common case), it returned the read position in microseconds. On a miss, it read from Cassandra and populated the cache.
The cache used a standard HashMap plus doubly-linked list implementation for O(1) lookups and O(1) evictions. Each entry was a Go struct with a channel ID, last-read message ID, mention count, and timestamps. The struct itself was small (roughly 80 bytes), but at millions of entries per instance, the aggregate heap was substantial.
// Simplified Go read-state struct
type ReadStateEntry struct {
ChannelID uint64
LastMessageID uint64
MentionCount uint32
LastPinTS uint64
}
// LRU eviction creates GC pressure:
// 1. New entry arrives → allocate ReadStateEntry on heap
// 2. LRU full → evict oldest entry → unlink from list
// 3. Evicted entry becomes garbage → GC must find and free it later
// 4. Millions of evictions/sec → massive GC scan workload
The request lifecycle
A typical request flow looked like this:
- User opens a channel in the Discord client.
- The client sends a read-state query via WebSocket to the gateway.
- The gateway routes the request to the correct Read States instance based on user ID shard.
- The service checks the LRU cache. On hit (>90% of requests), it returns immediately.
- On miss, it queries Cassandra, populates the cache, and returns.
- When the user reads messages, the client sends an update. The service writes to the cache and asynchronously persists to Cassandra.
This lifecycle generated constant cache churn. Every cache population is a heap allocation. Every eviction (when the cache reaches capacity) creates garbage. At peak traffic, each instance was allocating and deallocating hundreds of thousands of structs per second.
What worked
The architecture was sound. Sharding by user ID distributed load evenly. Cassandra provided durable storage with tunable consistency. The LRU cache delivered sub-millisecond reads for the hot path. Average latency was excellent.
What didn't work
The Go garbage collector scans the entire heap to identify live objects. With millions of LRU entries being created and evicted continuously, the scanner had enormous work to do. Each GC cycle took long enough to push p99 latency from ~500µs to ~10ms.
The problem was structural, not configurational. Go's GC is one of the best in the managed-language world, but it operates on a fundamental constraint: it must trace reachable objects at runtime. A workload that creates and destroys millions of small objects per second is the worst case for any tracing GC.
I've worked with teams that spent months tuning JVM GC parameters for similar workloads. The result is usually a game of whack-a-mole: you reduce one pause type and another emerges. Discord's team arrived at the same conclusion for Go's collector.
Why Not Just Tune the Go GC?
The obvious first question: why not fix the GC instead of rewriting the service? Discord's team tried several approaches before concluding that a rewrite was necessary.
Approach 1: Adjust GOGC
Go's GOGC environment variable controls how aggressively the GC runs. A higher value (say, GOGC=800) lets the heap grow larger before triggering collection, reducing GC frequency but increasing memory usage and making each GC pause longer. A lower value triggers GC more often but with shorter pauses.
Discord tried both directions. Higher GOGC reduced spike frequency but made each spike worse (longer pauses on a larger heap). Lower GOGC made spikes more frequent. Neither eliminated the fundamental problem.
The default GOGC=100 means the GC triggers when allocations since the last collection equal the live heap size. For a service with a 4GB live heap (millions of cached entries), this means the GC runs after roughly 4GB of new allocations, which at Discord's allocation rate happened every two minutes. Doubling GOGC to 200 delayed GC to every four minutes, but each pause took twice as long because the heap to scan was twice as large.
Approach 2: Memory ballast
A common Go optimization is to allocate a large, never-freed byte slice (the "ballast") to inflate the heap baseline. This tricks the GC into running less often because the heap-to-live ratio stays lower. Discord experimented with ballast sizes from 1GB to 8GB.
The ballast helped marginally. GC frequency dropped, but when collection did run, it still had to scan the millions of live LRU entries. The spike magnitude was unchanged. The ballast technique works well for services with bursty allocation patterns; it does not help when the allocation pressure is constant and structural.
Approach 3: Reduce pointer density
Go's GC traces pointers. Structs with fewer pointer fields mean less work per GC cycle. Discord restructured some data to use value types instead of pointer types where possible. This reduced GC scan time but did not eliminate it.
The 'just tune GC' trap
When profiling reveals GC as the bottleneck, the instinct is to tune GC parameters. This works when GC pressure comes from temporary spikes or suboptimal allocation patterns. It does not work when the workload fundamentally conflicts with GC's operating model: millions of small, short-lived allocations per second with strict tail-latency requirements.
Approach 4: Off-heap storage
One alternative was to move the LRU cache off-heap using mmap or cgo to allocate memory outside Go's managed heap. This would have hidden the data from the GC scanner. However, it would also have sacrificed Go's safety guarantees, required manual memory management through unsafe FFI, and introduced its own complexity.
At this point, the team concluded: if we are going to manage memory manually to avoid GC, we should use a language designed for manual memory management from the ground up, one with compile-time safety instead of runtime scanning.
The Decision
Discord chose Rust. The decision was not "Rust is trendy" but "Rust solves the specific problem we have, with safety guarantees we need."
Why Rust over C/C++
C and C++ also lack garbage collectors. But they require the programmer to manually allocate and free memory, with no compile-time enforcement that freed memory is not accessed later (use-after-free) or that allocated memory is ever freed (leaks). In a service handling billions of requests daily, memory safety bugs translate directly to security vulnerabilities and production outages.
Rust's ownership system proves at compile time that memory is freed exactly once, at the right time, with no dangling references. This gives the team C-level performance with Java-level memory safety (or better, since Rust prevents data races at compile time too).
Why not a different GC-free approach
The team also considered:
- Manual memory pools in Go: Preallocate a fixed pool, bypass the GC for hot data. Rejected because it required
unsafeGo code and reimplemented half of what Rust gives natively. - C with a custom allocator: Maximum performance but no safety guarantees. The team estimated roughly 3x the engineering time for equivalent reliability.
- Zig: Promising but immature in 2020. The ecosystem lacked production-grade libraries for the team's needs.
The tradeoff matrix
| Factor | Go (current) | Rust | C++ | C |
|---|---|---|---|---|
| GC pauses | Yes (the problem) | None | None | None |
| Memory safety | Runtime (GC) | Compile-time | Manual | Manual |
| Ecosystem maturity | Excellent | Good (2020) | Excellent | Excellent |
| Team familiarity | High | Low | Medium | Medium |
| Estimated rewrite time | N/A | ~6 weeks | ~8 weeks | ~12 weeks |
| Maintenance burden | Low | Medium | High | Very high |
The team had limited Rust experience, but Rust's compiler acts as a mentor: it rejects unsafe code with detailed error messages. Discord estimated the learning curve would pay for itself quickly given the service's long expected lifetime.
The learning curve reality
Rust has a notoriously steep learning curve, particularly around lifetimes and the borrow checker. Discord's team reported that the first two weeks were slow as engineers fought the compiler. By week three, productivity normalized. By week six, the team was writing Rust at roughly the same velocity as Go for this type of service.
The key insight: Rust's learning curve is front-loaded. The compiler forces you to think about ownership and lifetimes upfront, which feels slow. But it eliminates entire categories of runtime bugs (null pointer dereferences, data races, use-after-free) that would otherwise appear in production. The time "lost" to the compiler is time saved in debugging and incident response.
Interview insight
When discussing language choice in system design, frame it as a tradeoff, not a preference. "We chose Rust because our workload has high allocation churn and strict p99 requirements, which conflicts with GC-based runtimes" is strong. "Rust is faster than Go" is weak.
The Migration Path
Discord did not do a big-bang cutover. They followed a phased approach with rollback capability at every stage. The entire migration took roughly six weeks from first Rust commit to full production traffic.
Phase 1: Prototype and benchmark (Weeks 1-2)
The team ported the core LRU cache data structure from Go to Rust. They kept the algorithm identical: a HashMap plus a doubly-linked list for O(1) access and O(1) eviction. The Rust version used Box for heap-allocated nodes and raw pointers within the linked list (wrapped in unsafe blocks with careful invariant documentation).
They ran identical synthetic workloads against both implementations. The Go version showed the familiar two-minute spike pattern. The Rust version showed a flat latency line. No spikes, no variance, just consistent sub-millisecond responses.
Phase 2: Shadow traffic (Weeks 3-4)
The Rust service was deployed alongside the existing Go instances. Production read requests were mirrored to Rust instances in shadow mode: the Rust service processed the same requests but its responses were discarded. Only the Go service served live traffic.
This phase validated correctness. The team compared Rust responses against Go responses for millions of requests, checking for any divergence in read-state values. They found and fixed two serialization edge cases during this phase.
The shadow infrastructure ran on separate instances to avoid affecting production capacity. Discord used their existing request-mirroring infrastructure (built into the gateway tier) to duplicate traffic without modifying the Go service. This is a pattern worth noting: if your gateway already supports traffic mirroring, shadow testing is nearly free to set up.
Phase 3: Canary (Week 5)
With correctness validated, 1% of live traffic shifted to the Rust service. The team monitored latency percentiles, error rates, memory usage, and CPU utilization side by side. The Rust canary showed flat p99 at ~500µs while the Go instances continued their two-minute spike cycle.
The 1% canary ran for a full 48 hours before proceeding. The team specifically waited for multiple peak traffic windows (US evening, EU evening, Asia morning) to validate performance under different load profiles. All metrics remained stable.
The rollback plan was simple: shift the 1% back to Go via a load balancer configuration change. No data migration needed because both services read from the same Cassandra cluster.
Phase 4: Full rollout (Week 6)
Traffic shifted gradually: 1% to 10% to 50% to 100% over the course of a week. At each step, the team watched for any degradation. None appeared. By the end of week six, all Go instances were drained and the Rust service handled 100% of production traffic.
The Go container images were preserved for 90 days as a rollback option. They were never needed.
Key migration safeguards
Several design choices made this migration low-risk:
- Stateless service: Read States instances hold only cached data. If a Rust instance crashed, the load balancer routed to another instance, and the cache was repopulated from Cassandra. No data loss possible.
- Shared storage backend: Both Go and Rust instances read from the same Cassandra cluster. No data migration was needed, which eliminated the riskiest class of migration bugs.
- Feature parity, not feature expansion: The Rust service implemented exactly the same API. No new features, no new endpoints, no behavior changes. This made comparison testing deterministic.
- Independent validation at each phase: Each phase had explicit success criteria (benchmark numbers, response match rate, latency percentiles) and could not proceed until criteria were met.
The shadow traffic pattern
Shadow traffic (also called dark launching) is one of the safest migration strategies for stateless read services. You validate correctness without risking live users. If the new service produces wrong answers, nobody sees them. This pattern works whenever the service is read-heavy and the responses are deterministic.
The System After
What changed architecturally
The architecture is almost identical. Same sharding strategy, same Cassandra backend, same WebSocket gateway routing. The only change is the language runtime of the Read States service itself.
This is the key insight: the rewrite was not an architecture change. It was a runtime change. The data model, the API contract, the deployment topology, and the storage layer all remained the same. The only thing that changed was how memory is managed inside each service instance.
How the Rust LRU cache works
The Rust LRU cache uses the same algorithmic approach as the Go version. A HashMap maps keys to node pointers, and a doubly-linked list maintains access order. When an entry is evicted, Rust's ownership system ensures the memory is freed immediately, at the point of eviction, not deferred to a later GC scan.
// Simplified Rust LRU cache entry
struct ReadStateEntry {
channel_id: u64,
last_message_id: u64,
mention_count: u32,
last_pin_timestamp: u64,
}
// When evicted from the LRU:
// 1. Remove from HashMap (O(1))
// 2. Unlink from doubly-linked list (O(1))
// 3. Memory freed immediately via Drop trait
// No GC scan, no deferred collection, no pause
The Drop trait in Rust is deterministic: when a value goes out of scope or is explicitly dropped, its destructor runs inline. There is no background thread scanning for dead objects. Deallocation happens at the exact moment the entry is evicted, which means latency is constant and predictable.
This determinism is the core advantage. In a GC language, you trade predictability for convenience: the GC handles deallocation so you do not have to think about it, but it picks its own timing. In Rust, you trade convenience for predictability: you must think about ownership, but deallocation timing is fully under your control.
Memory layout advantages
Rust structs have no runtime overhead. There are no GC headers, no type metadata pointers, no alignment padding for the GC. A ReadStateEntry in Rust occupies exactly the bytes its fields require (plus alignment). The equivalent Go struct carries additional overhead for the GC's internal bookkeeping.
This tighter memory layout means more entries fit in CPU cache lines, improving cache hit rates for the HashMap lookups. It also means less total memory consumption for the same number of entries.
Async runtime: Tokio
The Rust service uses Tokio as its async runtime for handling concurrent connections. Tokio's work-stealing scheduler distributes tasks across OS threads efficiently. Unlike Go's goroutine scheduler, Tokio does not include a garbage collector in its runtime. Task switching is cooperative (via .await points), giving the application precise control over when context switches happen.
This matters for the LRU workload because cache lookups are CPU-bound (HashMap probe, linked-list update). In Go, the runtime can preempt a goroutine mid-operation for GC. In Rust with Tokio, a task runs uninterrupted until it explicitly yields. For a sub-microsecond cache lookup, this means zero scheduling overhead per request.
The Results
| Metric | Go Service (Before) | Rust Service (After) |
|---|---|---|
| p99 latency (normal) | ~500µs | ~500µs |
| p99 latency (during GC spike) | ~10ms | ~500µs (no spikes) |
| GC pause frequency | Every ~2 minutes | None (no GC) |
| Memory usage per instance | Baseline | ~70% of Go baseline |
| CPU utilization | Baseline | ~85% of Go baseline |
| Cache entry overhead | ~120 bytes (with GC headers) | ~80 bytes (raw struct) |
| Tail latency variance | High (20x p50-to-p99 ratio) | Low (< 2x p50-to-p99 ratio) |
Interpreting the numbers
The average latency did not improve. Both the Go and Rust services delivered ~500µs median responses. The improvement was entirely in the tail: the p99 spikes disappeared.
This is a critical distinction. If you only look at average latency, the rewrite accomplished nothing. The value was in eliminating variance. For a service that receives fan-out reads (many parallel requests per user session), the slowest response determines the user-perceived latency. Eliminating the 10ms spikes made Discord's client feel consistently fast instead of intermittently sluggish.
Memory usage dropped ~30%. This came from two sources: no GC bookkeeping overhead per object, and tighter struct packing. The CPU savings (~15%) came from eliminating GC scan work entirely. Those CPU cycles were previously spent tracing pointers through the heap.
The latency distribution shift
The most telling metric was not p99 alone but the full latency distribution. In the Go service, the histogram showed a tight cluster around 500µs with a long tail stretching to 10ms+ that appeared every two minutes. The Rust service's histogram was a single tight cluster with no secondary mode.
This is what "deterministic performance" looks like in practice. Without a GC, there is no background process that can suddenly compete for CPU time. Latency variance comes only from application logic, network I/O, and kernel scheduling, all of which are consistent and predictable for a cache lookup service.
Impact on downstream services
The Read States improvement had a ripple effect. Services that depended on Read States (guild list rendering, notification badge calculation, unread indicators) all saw their own p99 improve because their slowest dependency got faster. In a microservices architecture, tail latency in one service propagates to every caller. Fixing the tail in Read States effectively fixed the tail across a portion of the dependency graph.
Discord observed that aggregated client-side page load p99 improved by ~15% after the Rust rollout, even though Read States was just one of many services contributing to page load time. That single-service improvement rippled through the fan-out.
What stayed the same
Cassandra read/write patterns were unchanged. The Rust service used the same Cassandra driver (via Rust bindings), the same consistency levels, and the same retry policies. Network latency to Cassandra was unchanged. The improvement was entirely within the service process boundary.
The 30% memory saving adds up
At Discord's scale, a 30% reduction in memory per instance translates to either running fewer instances (cost savings) or fitting more cache entries per instance (better hit rates). Discord chose the latter, improving the cache hit ratio and further reducing Cassandra read load.
What They'd Do Differently
No migration is perfect. Based on public talks and blog posts from Discord engineers, several retrospective insights emerged.
Build comprehensive benchmarks before starting
Discord's initial benchmarks were synthetic: uniform key distribution, steady request rate. Production traffic has bursts, hot keys, and correlated access patterns. The shadow traffic phase (Phase 2) caught this gap, but the team wished they had built a replay harness that could feed recorded production traffic through both implementations from day one.
I've seen this gap bite teams repeatedly. Synthetic benchmarks prove the concept; production replay proves correctness. Build both before you start, not after you discover divergence in shadow mode.
If you have the infrastructure for it, record a day's worth of production requests (anonymized if needed) and replay them against the new implementation. This catches edge cases that uniform synthetic traffic never triggers: burst patterns, hot keys, clock skew in timestamps, and unusual field combinations.
Start with the data structure, not the whole service
The team ported the entire service at once. In hindsight, they could have extracted just the LRU cache into a Rust library called via FFI from Go. This would have isolated the GC-sensitive code without rewriting the HTTP handling, serialization, and Cassandra interaction layers.
The counterargument: FFI between Go and Rust introduces its own overhead (CGo call overhead is not trivial, roughly 50-100ns per call), and maintaining a polyglot service adds operational complexity. The full rewrite was simpler in the long run. But for teams with less Rust experience, the FFI approach would be a lower-risk starting point.
Invest in Rust tooling earlier
The team's Rust debugging and profiling toolchain was less mature than their Go toolchain. They spent time building custom Prometheus exporters and integrating with their existing observability stack. Starting this work in Phase 1 (instead of Phase 3) would have accelerated the canary phase.
Document unsafe blocks rigorously
The LRU cache implementation required unsafe Rust for the doubly-linked list (raw pointers for O(1) node manipulation). The initial implementation had minimal documentation on safety invariants. I've seen this pattern repeatedly: teams writing unsafe Rust treat it like normal Rust and skip the invariant documentation. In a later code review cycle, Discord added detailed safety comments to every unsafe block.
Operational changes post-migration
The Rust service required some operational adjustments compared to Go:
- Build times: Rust compiles significantly slower than Go. CI pipeline time for the Read States service went from ~30 seconds (Go) to ~3 minutes (Rust). The team mitigated this with incremental compilation and build caching (sccache).
- Binary size: The Rust binary was larger (~15MB vs ~8MB for Go), but this had negligible impact on deployment speed.
- Debugging: Rust's backtrace support (
RUST_BACKTRACE=1) provided good stack traces, but the team missed Go's built-in profiler (pprof). They integratedperfandflamegraphtools to compensate. - Dependency management: Cargo (Rust's package manager) was well-regarded by the team. Dependency resolution and version pinning worked reliably, comparable to Go modules.
The operational overhead was modest. The team estimated roughly 10% more time spent on build and debug tooling, offset by roughly 100% fewer GC-related production incidents (which had previously required investigation time even though they were "expected" behavior).
Architecture Decision Guide
Use this flowchart when evaluating whether to rewrite a service in a non-GC language.
The critical gate is Q3: have you profiled and confirmed GC is the bottleneck? Rewriting a service without profiler evidence that GC causes the problem is cargo-culting Discord's solution. Most latency problems are not GC-related, even in GC languages. Network I/O, database queries, lock contention, and algorithmic inefficiency are far more common culprits.
Most services should not follow Discord's path
This decision flowchart will lead the majority of services to "keep your current GC language." That is intentional. Rewriting in Rust is expensive (team ramp-up, ecosystem gaps, hiring pipeline constraints) and only justified when the profiler proves GC is the dominant source of tail latency after tuning exhaustion. If your p99 is bottlenecked on database queries, switching languages will not help.
Transferable Lessons
1. Profile before you rewrite
Discord confirmed GC was the bottleneck with profiler data before writing a single line of Rust. They did not rewrite because Rust was fashionable or because someone read a blog post. The profiler showed exactly where time was spent, and the fix addressed that exact problem.
This applies everywhere. Before any major architectural change, instrument the system and prove that the proposed change addresses the actual bottleneck. I've seen teams rewrite services for "performance" only to discover the bottleneck was in the database query layer, untouched by the rewrite.
2. Tail latency matters more than average latency
The average latency of Discord's Go service was fine. The problem was the p99 spikes. In fan-out architectures (where a single user request triggers multiple backend calls), the slowest call determines user-perceived latency. Optimizing the average while ignoring the tail optimizes the metric nobody experiences.
When designing latency-sensitive systems, always measure and optimize for p99 (or p99.9), not the mean. This principle applies regardless of language choice.
3. GC is a tradeoff, not a defect
Go's garbage collector is not broken. It is an excellent GC for the vast majority of workloads. The problem was a specific interaction: a data structure that generates enormous allocation churn combined with strict tail-latency requirements. Most Go services never hit this issue.
The transferable principle: every runtime has edge cases. Know what your runtime's edge cases are, and measure whether your workload hits them. Do not assume your workload is special until the profiler proves it.
4. Incremental migration beats big-bang rewrite
Discord's four-phase migration (prototype, shadow, canary, rollout) with rollback capability at each phase is a textbook safe migration. They never had a moment where a failure would have caused user-visible impact. The shadow traffic phase caught two bugs before any live traffic touched the new service.
For any service rewrite, plan phases with independent validation criteria and rollback plans. The "rewrite everything, deploy on Friday" approach is how outages happen.
5. The best architecture change is sometimes no architecture change
The Read States rewrite changed zero architectural components. Same sharding, same database, same API, same deployment topology. The only change was the language runtime of one service. This is worth emphasizing: the team resisted the temptation to "improve" the architecture while rewriting. They changed exactly one variable (the language) and measured the result.
When fixing a specific problem, change one thing at a time. If you rewrite the service AND change the database AND redesign the API, you cannot attribute the improvement (or regression) to any single change.
Bonus: Discord's broader Rust adoption
The Read States rewrite was not an isolated experiment. After its success, Discord adopted Rust for additional performance-sensitive services. Their Elixir-based voice signaling server was later rewritten in Rust, and new infrastructure services (including parts of their message storage pipeline) were built in Rust from the start.
This trajectory is common: a single successful Rust rewrite builds team expertise and confidence, which lowers the cost of subsequent Rust projects. The first project is expensive (learning curve). The second and third are significantly cheaper because the tooling, libraries, and team knowledge already exist.
The flywheel effect of language adoption
Discord's Rust adoption followed a pattern seen at other companies (Dropbox, Cloudflare, Figma): one high-visibility success project proves the language works in your environment, builds internal champions, and creates reusable libraries. This reduces the cost-benefit bar for subsequent projects. The lesson: if you are considering Rust (or any new language), pick a single well-scoped service for the first project, not the hardest or most critical system.
How This Shows Up in Interviews
When to cite this case study
Mention Discord's Go-to-Rust migration when the interviewer asks about:
- Latency-sensitive caching layers
- Language/runtime selection for performance-critical services
- GC tuning limits in Java/Go/C#
- Tail latency in fan-out architectures
- Safe migration strategies for critical services
- Runtime tradeoffs (GC overhead vs. development velocity)
The sentence to say: "Discord rewrote their Read States cache from Go to Rust specifically to eliminate GC-induced p99 spikes, cutting tail latency from 10ms to a consistent 500 microseconds."
How to use it without over-applying
The biggest mistake candidates make is citing this case study to justify Rust for every service. The interviewer will push back: "Why not just use Go with a bigger instance?" or "Isn't the development cost too high?"
The strong response acknowledges the narrow applicability: "This pattern applies specifically when you have high allocation churn, strict p99 requirements, and you have profiler evidence that GC is the bottleneck after tuning. For most services, Go or Java with tuned GC parameters is the right choice because the development velocity more than compensates for the small GC overhead."
I've seen candidates lose points by over-indexing on this case study. The interviewer might follow up with "What if the team had only two weeks?" or "What if the service also needed to add new features?" In both cases, the answer changes: a constrained timeline or feature velocity requirement might favor GC tuning and accepting the p99 spike over a full rewrite.
Q&A table
| Interviewer Asks | Strong Answer |
|---|---|
| "How would you handle GC pauses in a cache service?" | "First, profile to confirm GC is the actual bottleneck. Then try GOGC tuning, memory ballast, and off-heap storage. If none work (as Discord found), consider extracting the hot path to a non-GC language like Rust, which gives you deterministic deallocation without sacrificing memory safety." |
| "When would you choose Rust over Go?" | "When the workload has high allocation churn and strict p99 requirements. Discord's LRU cache evicted billions of entries daily, and Go's GC scanned them all. Rust frees memory at eviction time with zero runtime overhead. For most services, Go's GC is excellent and the rewrite cost is not justified." |
| "How do you safely migrate a critical service?" | "Shadow traffic first: mirror production reads to the new service and compare responses. Then canary at 1% with full rollback capability. Then gradual rollout (1% to 10% to 50% to 100%) with monitoring at each step. Discord completed this in six weeks with zero user-visible incidents." |
| "Why does tail latency matter more than average?" | "In fan-out architectures, the slowest response determines user-perceived latency. If a user triggers 50 parallel backend calls and each has a 1% chance of hitting a 10ms spike, there is a 40% chance at least one call is slow. Optimizing the average while ignoring the tail optimizes a metric nobody experiences." |
| "Is Go bad for performance-critical services?" | "No. Go's GC is among the best. The issue is a specific workload pattern: millions of short-lived allocations per second in a latency-sensitive path. Most Go services (API servers, CLI tools, network proxies) never hit this. The lesson is to know your runtime's edge cases, not to avoid GC languages." |
Quick Recap
- Discord's Go Read States service experienced metronomic p99 latency spikes every two minutes, caused by garbage collector pauses scanning millions of LRU cache entries.
- The team tried four GC-tuning approaches (GOGC adjustment, memory ballast, pointer density reduction, off-heap storage) before concluding that a rewrite was necessary.
- They chose Rust over C/C++ because Rust provides compile-time memory safety without garbage collection, eliminating use-after-free and data race bugs at compile time.
- The migration followed four phases (prototype, shadow traffic, canary, full rollout) over six weeks, with rollback capability at every stage.
- Shadow traffic testing caught two serialization bugs (timestamp precision mismatch, column ordering divergence) before any live traffic hit the new service.
- The Rust service achieved the same average latency but eliminated all p99 spikes, while reducing memory usage by ~30% and CPU usage by ~15%.
- The architecture did not change: same sharding, same Cassandra backend, same API contract. Only the language runtime changed.
- Downstream services saw their own p99 improve because their slowest dependency (Read States) was no longer producing periodic spikes.
- Transferable principle: for latency-sensitive services with high allocation churn, profile first, tune second, and rewrite only when profiler data confirms GC is the actual bottleneck that tuning cannot fix.
Related Concepts
Below are the core concepts that connect to this case study. Each link leads to a deeper exploration of the underlying principle.
- Caching strategies: The LRU cache at the heart of this case study. Understanding cache eviction policies and memory trade-offs helps you predict when GC pressure becomes a problem.
- Consistency models: Discord's Cassandra backend uses tunable consistency. The Read States service's cache-aside pattern interacts with Cassandra's eventual consistency model.
- Replication: Cassandra's replication strategy determines how read-state durability works underneath the cache layer. The tunable consistency levels (ONE, QUORUM, ALL) directly affect read/write latency tradeoffs.
- Latency vs throughput: This case study is a textbook example of optimizing for tail latency rather than throughput. The Go service had excellent throughput; the problem was exclusively in tail latency variance. Understanding Little's Law and percentile math helps you quantify the impact.
- Observability: Discord's ability to correlate latency spikes with GC events required mature metrics and tracing infrastructure. Without that observability, the team would not have identified the root cause.
- SLOs, SLIs, and SLAs: The p99 latency target that drove this rewrite is an SLO. Understanding how to define and measure SLOs for tail latency helps you know when GC pauses become a violation rather than a nuisance.
- Distributed tracing: Correlating GC events with latency spikes required distributed tracing. Without trace-level visibility into which requests hit GC pauses, the team could not have diagnosed the root cause with confidence.