Slack: Migrating to a relational model
How Slack redesigned its message storage from a flat text log with metadata columns to a relational channel model, enabling more efficient queries and reducing read amplification.
TL;DR
- Slack's original message storage used a flat schema with
workspace_id + channel_id + tsas the primary key, optimized for append-only chat logs. - As features accumulated (threads in 2017, reactions, edits, bookmarks), every new feature required join chains keyed on the timestamp, a value that also served as sort key, identity, and permalink anchor.
- Slack introduced a canonical
message_idcolumn to separate "identity" from "display order," normalized related tables around it, and moved variable metadata into a JSON body column. - The migration covered hundreds of billions of rows across a Vitess-sharded MySQL cluster, using a dual-write + backfill strategy with zero downtime over 2018-2020.
- Transferable lesson: when your primary key carries multiple responsibilities (sorting, identity, foreign key), schema evolution becomes exponentially harder. Separate concerns early.
The Trigger
By mid-2018, Slack's message storage was buckling under feature weight that nobody anticipated in 2013. Loading a single channel view required joins across six tables, all keyed on a microsecond-precision Unix timestamp that was never designed to be a relational foreign key.
The symptom was read amplification. A "load channel" query that once scanned a single table now fanned out into multiple round trips: fetch messages, then join reactions, then join edits, then join thread metadata, then join bookmarks, then join pinned items. Each join used (workspace_id, channel_id, ts) as the key. At Slack's scale (tens of millions of active users, thousands of messages per second), this fanout was measurable in both latency and database CPU.
The tipping point came when the team started building Slack Connect (cross-organization channels). The flat schema made cross-workspace queries nearly impossible without duplicating data. The engineering team decided that patching the existing model was no longer viable: the schema itself needed to change.
The System Before
Slack started as a side project inside Tiny Speck, a gaming company building Glitch. The internal chat tool became the product. The original message schema reflected that origin: a simple append log optimized for writing messages fast and reading them back in order.
The original messages table looked roughly like this:
-- Simplified original schema (circa 2013-2017)
CREATE TABLE messages (
workspace_id VARCHAR(12) NOT NULL,
channel_id VARCHAR(12) NOT NULL,
ts DECIMAL(17,6) NOT NULL, -- Unix epoch, microsecond precision
user_id VARCHAR(12) NOT NULL,
type VARCHAR(32) NOT NULL,
text TEXT,
has_attachments BOOLEAN DEFAULT FALSE,
has_reactions BOOLEAN DEFAULT FALSE,
is_pinned BOOLEAN DEFAULT FALSE,
PRIMARY KEY (workspace_id, channel_id, ts)
);
For the core use case (load 50 most recent messages in a channel), this schema was excellent. One range scan on (workspace_id, channel_id, ts DESC LIMIT 50) returned everything. No joins, no secondary lookups. The write path was equally clean: insert a row with the current timestamp. Done.
The ts column deserves special attention. Slack chose a microsecond-precision Unix timestamp as the primary key for a few reasons. It provided natural sort order (messages sort by time). It was URL-safe for permalinks (/archives/C01234/p1529000000000100). And it was unique enough within a channel because microsecond precision meant collisions were near-impossible.
I've seen this exact pattern in early-stage systems: pick a naturally ordered value as the primary key because it simplifies the first three features. The problem only surfaces two years later when feature four needs that key to do something it was never designed for.
How feature accretion broke the model
Each new feature added a table, and each new table used ts as its foreign key:
Threads (2017). Slack introduced threaded replies, which meant a message could be a "parent" with child replies. The thread relationship was modeled with a thread_ts column pointing back to the parent message's ts. Loading a thread required a self-join: find all messages where thread_ts = parent.ts. This worked, but now ts was both a sort key and a relational pointer.
Reactions. Emoji reactions lived in a separate table keyed by (workspace_id, channel_id, message_ts, user_id, reaction). Any query that needed to show reactions alongside messages required a join back through ts. Showing a channel with 50 messages, each potentially having reactions, meant 50 point lookups into the reactions table.
Edits. Slack stores edit history for messages. The edits table was keyed by (workspace_id, channel_id, ts, edit_ts). Displaying the current version of a message required joining to the edits table and taking the most recent edit. That is two range scans per message.
Bookmarks and pins. More tables, more joins, same key. By 2018, loading a channel with threads, reactions, and edit history touched six tables, all joined on a Unix timestamp.
The compound effect was devastating for read performance. A query that started as a single range scan now looked like:
-- Pseudocode: loading a channel view by 2018
SELECT m.*,
r.reactions,
e.latest_edit,
t.reply_count,
p.is_pinned,
b.is_bookmarked
FROM messages m
LEFT JOIN reactions r ON (m.workspace_id = r.workspace_id
AND m.channel_id = r.channel_id
AND m.ts = r.message_ts)
LEFT JOIN edits e ON (m.workspace_id = e.workspace_id
AND m.channel_id = e.channel_id
AND m.ts = e.message_ts)
LEFT JOIN threads t ON (m.workspace_id = t.workspace_id
AND m.channel_id = t.channel_id
AND m.ts = t.thread_ts)
LEFT JOIN pins p ON (m.workspace_id = p.workspace_id
AND m.channel_id = p.channel_id
AND m.ts = p.message_ts)
LEFT JOIN bookmarks b ON (m.workspace_id = b.workspace_id
AND m.channel_id = b.channel_id
AND m.ts = b.message_ts)
WHERE m.workspace_id = ? AND m.channel_id = ?
ORDER BY m.ts DESC
LIMIT 50;
Every join key was a three-column composite that included a decimal timestamp. Index lookups on DECIMAL(17,6) are not catastrophically slow, but they are slower than integer lookups, and doing six of them per channel load at Slack's scale was burning real CPU.
The multi-responsibility key trap
When a single column serves as sort key, display value, permalink component, and foreign key simultaneously, you cannot change any of those behaviors independently. Slack could not switch to a more join-friendly key format without breaking permalinks. They could not add a secondary sort dimension without rethinking identity. The ts column was load-bearing in too many directions.
Why Not Just Add Indexes?
The obvious first question: why not add covering indexes and keep the schema?
Slack's team evaluated several incremental approaches before deciding on a full schema redesign.
More indexes on the flat schema. Adding covering indexes for the common join patterns would speed up reads but dramatically increase write amplification. Every message insert would update six indexes (one per join target). At thousands of messages per second, the index maintenance cost was prohibitive.
Denormalize everything into the messages table. Store reactions, edit status, thread metadata, and pin state directly in the messages row. This eliminates joins entirely but creates a new problem: updating a reaction or adding a thread reply requires a write to the messages table, which contends with new message inserts. It also means the messages row grows unboundedly as features accumulate.
Use a document store (MongoDB, DynamoDB). Switching to a document model would let Slack embed all metadata in a single document per message. But Slack's entire infrastructure was built around MySQL. Cross-cutting features like search, compliance exports, and enterprise audit logs ran SQL queries against the messages schema. Switching storage engines would require rewriting dozens of downstream systems.
Cache the join results. Front the database with a cache that stores pre-joined channel views. This helps for hot channels but has invalidation issues: a single reaction, edit, or pin invalidates the cached view for the whole channel. At Slack's feature velocity, cache invalidation would be a permanent engineering drag.
None of these solved the root problem: ts was the wrong type of key for relational operations. The team decided to fix the schema at the source.
| Approach | Read Benefit | Write Penalty | Why Rejected |
|---|---|---|---|
| More indexes | Faster joins | 6x index writes per message | Write amplification too high |
| Denormalize into messages | Zero joins | Update contention on hot rows | Unbounded row growth |
| Switch to document store | No joins needed | Rewrite all downstream SQL | Too much blast radius |
| Cache join results | Sub-ms reads for hot channels | Invalidation on every mutation | Permanent engineering drag |
The Decision
Slack chose a three-part approach: introduce a canonical message_id, normalize the schema around it, and move flexible metadata into a JSON column.
Part 1: Separate identity from display order
The core insight was that ts was doing too many jobs. By introducing a dedicated message_id (a bigint auto-increment), Slack could:
- Use
message_idas the stable foreign key for all related tables. - Keep
tsfor display ordering and permalink generation. - Allow
tsto be imprecise or even adjusted without breaking relational integrity.
This is the same principle behind surrogate keys in database design. Natural keys (timestamps, emails, usernames) seem convenient but resist schema evolution. Surrogate keys (auto-increment integers, UUIDs) are boring and stable.
Part 2: Normalize around message_id
-- New messages table (simplified)
CREATE TABLE messages (
message_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
workspace_id VARCHAR(12) NOT NULL,
channel_id VARCHAR(12) NOT NULL,
ts DECIMAL(17,6) NOT NULL,
user_id VARCHAR(12) NOT NULL,
body JSON, -- text, attachments, formatting metadata
PRIMARY KEY (workspace_id, channel_id, message_id),
UNIQUE KEY (workspace_id, channel_id, ts)
);
-- Reactions now key on message_id
CREATE TABLE reactions (
message_id BIGINT UNSIGNED NOT NULL,
user_id VARCHAR(12) NOT NULL,
reaction VARCHAR(64) NOT NULL,
created_at TIMESTAMP NOT NULL,
PRIMARY KEY (message_id, user_id, reaction)
);
-- Threads reference parent by message_id
CREATE TABLE threads (
parent_message_id BIGINT UNSIGNED NOT NULL,
reply_message_id BIGINT UNSIGNED NOT NULL,
reply_ts DECIMAL(17,6) NOT NULL,
PRIMARY KEY (parent_message_id, reply_message_id)
);
The difference in join efficiency is significant. Instead of joining on a three-column composite key with a decimal, every join now uses a single BIGINT column. Integer equality checks are the fastest operation a B-tree index can do.
Part 3: JSON body column
Rather than adding a new column every time a feature needed message-level metadata (formatting options, link previews, app integrations), Slack moved variable fields into a body JSON column. This traded relational indexing for schema flexibility.
The tradeoff is real. You cannot write WHERE body->>'$.has_link_preview' = true efficiently across billions of rows. But Slack did not need to: the primary access patterns (load by channel, load by thread) never filter on those fields. They are display metadata, read after the row is fetched.
I've seen teams reach for JSON columns too eagerly, storing data there that they later need to query relationally. Slack's approach worked because they were disciplined about what went into JSON (display-only metadata) versus what stayed as a proper column (anything used in WHERE clauses or joins).
Interview tip: JSON columns
When you propose a JSON or blob column in a design interview, immediately state what you are giving up: "We lose the ability to index or filter on fields inside the JSON. This is acceptable because these fields are only read after row selection, never used in WHERE clauses." That single clarification shows you understand the tradeoff, not just the convenience.
The Migration Path
Migrating hundreds of billions of rows across a sharded MySQL cluster while serving live traffic is not a single clever technique. It is a sequence of carefully ordered phases, each with its own validation and rollback plan.
Phase 1: Dual-write
The application layer was modified to write every new message to both the old (ts-keyed) and new (message_id-keyed) tables. This is the safest starting point because reads still hit the old table exclusively. If the new table has issues, nothing user-facing breaks.
The dual-write path had to handle a subtle ordering problem. The old table's ts was generated by the application at message creation time. The new table's message_id was generated by MySQL's auto-increment. These two values had to be associated so a ts could be resolved to a message_id and vice versa. The application wrote to the new table first (to get the message_id), then wrote to the old table with the ts, and stored the (ts, message_id) mapping.
Phase 2: Backfill
The backfill job was the longest phase. Slack had hundreds of billions of historical messages that needed message_id values assigned and written to the new schema.
The backfill ran as a batch job that:
- Scanned the old table shard by shard, workspace by workspace.
- Assigned monotonically increasing
message_idvalues that preserved thetsordering within each channel. - Wrote to the new table with aggressive rate limiting to avoid saturating any shard.
Rate limiting was critical. At full speed, the backfill could generate write load equivalent to production traffic, which would degrade live queries. Slack throttled the backfill to run during off-peak hours and paused automatically when shard load crossed thresholds.
This phase took months. Hundreds of billions of rows, each requiring a new column assignment and a write to a new table structure, with no tolerance for data loss or ordering violations.
Phase 3: Shadow reads and validation
Before cutting over reads, Slack ran shadow queries: every read request went to both the old and new tables, and a background process compared the results. Any discrepancy was logged, investigated, and fixed before proceeding.
The shadow read phase catches bugs that unit tests miss: encoding differences, timezone edge cases, off-by-one errors in ordering, and corner cases around deleted or edited messages.
Phase 4: Gradual read cutover
Reads shifted to the new table using a percentage-based ramp. Start at 1% of reads from the new table, monitor latency and error rates, then increase to 5%, 10%, 25%, 50%, 100%. At each step, Slack could pause or roll back to the old table if metrics degraded.
The entire cutover took weeks. For a system serving tens of millions of users, impatience during migration is the fastest path to an outage.
Dual-write is not free
Dual-write doubles your write load and introduces consistency risks. If the write to the new table fails but the write to the old table succeeds, you have a gap. Slack handled this with a reconciliation job that detected and repaired gaps by comparing the two tables periodically. In a design interview, if you propose dual-write, mention the reconciliation mechanism. Otherwise you have an incomplete migration strategy.
Phase 5: Cleanup
After 100% of reads served from the new table for a cooling period (weeks to months), the old table was decommissioned. Slack kept the old table in a read-only state during the cooling period as an emergency fallback.
The System After
The post-migration architecture separated concerns cleanly: message_id for relational integrity, ts for display and sorting, JSON for flexible metadata, and Vitess for horizontal scaling.
Vitess as the sharding layer
The messages table is not a single MySQL instance. It is distributed across hundreds of MySQL shards managed by Vitess, Google's open-source MySQL sharding middleware.
Vitess sits between the application and MySQL. The application sends standard SQL queries to VTGate (Vitess's query router), which parses the query, identifies the target shard based on the workspace_id, and forwards the query to the correct MySQL instance.
Using workspace_id as the shard key means all messages, reactions, threads, and metadata for a single workspace live on the same shard. This is critical because the dominant query pattern (load a channel) operates within a single workspace. There are no cross-shard joins for the common case.
The tradeoff is that very large workspaces (Slack's own workspace, for example, or major enterprise customers) create hot shards. Vitess handles this through shard splitting: when a shard grows too large, Vitess can split it into two shards by subdividing the key range. This is transparent to the application.
Operational challenges at shard scale
Running hundreds of MySQL shards introduces operational complexity that a single-instance MySQL never faces.
Schema migrations. Changing a column type or adding an index must be applied to every shard. Vitess provides vtctl ApplySchema for coordinated schema changes, but a bad migration can still cascade across shards. Slack adopted a canary shard strategy: apply the migration to one shard, validate, then roll out to the rest.
Cross-shard queries. Admin tools, compliance exports, and search features sometimes need to query across workspaces. These queries cannot use the shard key for routing, so Vitess scatters the query to all shards and gathers results. Scatter-gather queries are expensive and must be used sparingly. Slack built dedicated read replicas for cross-shard analytics workloads.
Monitoring at shard granularity. A slow shard might indicate a hot workspace, a runaway query, or hardware degradation. Slack's monitoring had to shift from "is the database healthy?" to "which of these 200+ shards is unhealthy and why?"
The Results
The migration delivered measurable improvements across read performance, write efficiency, and schema evolvability.
| Metric | Before (ts-keyed flat schema) | After (message_id + Vitess) |
|---|---|---|
| Tables joined per channel load | 6 | 1-2 (message_id joins) |
| Join key type | 3-column composite with DECIMAL | Single BIGINT |
| Schema changes for new features | ALTER TABLE + backfill per feature | Add JSON field (no schema change) |
| Read amplification (queries per channel view) | 6+ round trips | 1-2 round trips |
| Migration duration | N/A | ~2 years (2018-2020) |
| Rows migrated | N/A | Hundreds of billions |
| Downtime during migration | N/A | Zero |
The biggest win was not raw latency (though that improved) but engineering velocity. Before the migration, adding any feature that touched messages required careful planning around the six-table join chain. After the migration, new features could either add a field to the JSON body (no schema change required) or create a new table keyed on message_id (single-column join, trivial).
For the interview: the metric that matters most here is not p99 latency. It is "time to ship a new message-related feature." Schema design directly determines feature velocity.
What They'd Do Differently
Based on Slack's public engineering blog posts and conference talks from 2019-2020:
Start with a surrogate key from day one. The team acknowledged that using ts as the primary key was a reasonable choice for a startup, but the cost of fixing it later was enormous. If they were starting over, they would use an auto-increment or Snowflake ID as the primary key from the beginning, with ts as a secondary indexed column.
Invest in Vitess tooling earlier. The shard-splitting and schema migration tooling in Vitess required significant custom work to integrate with Slack's deployment pipeline. The team wished they had started that integration work six months earlier.
Build the reconciliation system before starting dual-write. The gap-detection system that ensured consistency between old and new tables was built mid-migration. Building it first would have caught issues earlier and given the team more confidence during the backfill phase.
I've seen this pattern at every company that does a large-scale migration: the scaffolding (validation, reconciliation, monitoring) takes longer than the actual data movement. Teams consistently underestimate scaffolding effort by 2-3x.
Architecture Decision Guide
Use this flowchart when deciding whether to refactor your schema or add more denormalization.
Transferable Lessons
1. Separate identity from display order.
Slack's ts was the sort key, the foreign key, and the permalink anchor. When any of those needed to change independently, the entire schema resisted. Use a surrogate key for relational integrity and keep natural keys for presentation only. This applies to any system where the "natural" key seems convenient but carries multiple meanings.
2. Dual-write migrations require reconciliation, not just replication. Writing to two tables simultaneously is the easy part. Detecting and repairing gaps between the two tables is the hard part. Every dual-write migration needs a reconciliation job that runs continuously, comparing source and target, and alerting on discrepancies. Without it, you are migrating on faith.
3. JSON columns are a schema evolution tool, not a data model. Slack used JSON for display metadata that never appears in WHERE clauses. This is the right use case. JSON columns become a liability when you later need to filter, aggregate, or join on fields inside them. Draw the line at write time: structured queries need columns, display-only data can live in JSON.
4. Feature accretion is a schema design force. No schema survives five years of feature development unchanged. The question is not "will we need to migrate?" but "how painful will the migration be?" Schemas with single-purpose keys and clear separation of concerns are cheaper to migrate than schemas where one column does everything.
5. Shard key choice determines your query economics.
Slack chose workspace_id because the dominant access pattern operates within a single workspace. This made common queries shard-local (cheap) and cross-workspace queries scatter-gather (expensive). When picking a shard key, optimize for the 95th percentile query, not the edge case.
How This Shows Up in Interviews
Slack's migration is directly relevant when you are designing a messaging system, a comments system, or any feature-rich content storage layer.
When to cite this case study: Any question involving "design a chat system" or "design a messaging platform" where the interviewer pushes past the basic schema into how threads, reactions, or edits would work.
The sentence to say: "I'd use a surrogate message_id as the relational key and keep the timestamp for display ordering only, because using the timestamp as a foreign key across feature tables creates expensive composite joins at scale."
| Interviewer asks | Strong answer citing this case study |
|---|---|
| "How would you store messages?" | "Append log with workspace + channel + message_id as PK. The message_id is the FK for reactions, threads, edits. Timestamp is indexed separately for display ordering." |
| "How do you handle threads?" | "Separate threads table keyed on parent_message_id. Single BIGINT join, not a composite timestamp key. Slack learned this the hard way when self-joins on ts became their bottleneck." |
| "What about reactions and edits?" | "Separate tables, all keyed on message_id. Variable display metadata goes in a JSON body column to avoid schema changes for every new feature." |
| "How would you migrate at scale?" | "Dual-write to old and new table, backfill historical rows with rate limiting, shadow-read for validation, then gradual % cutover. Zero downtime, takes months at billions of rows." |
| "How would you shard this?" | "Shard on workspace_id using Vitess. All channel queries are shard-local. Cross-workspace queries (admin, compliance) go to dedicated read replicas via scatter-gather." |
Quick Recap
- Slack's original message schema used
(workspace_id, channel_id, ts)as the primary key, optimized for fast appends and chronological reads. - As threads, reactions, edits, bookmarks, and pins accumulated, every feature required joins keyed on the timestamp, a value never designed for relational operations.
- By 2018, loading a channel view required joins across six tables, all using a three-column composite key with a DECIMAL timestamp.
- Slack introduced a
BIGINT message_idas the canonical relational key, keepingtsfor display and sorting only. - Variable message metadata moved into a JSON body column, eliminating schema migrations for new display features.
- The migration used dual-write + backfill + shadow-read + gradual cutover across hundreds of billions of rows with zero downtime over 2018-2020.
- Transferable principle: when a primary key carries multiple responsibilities, separate them before the join chain makes migration impossible.
Related Concepts
- Databases - Foundational understanding of relational vs. document models, indexing strategies, and why join key type matters for query performance.
- Sharding - How Vitess implements horizontal partitioning, shard key selection tradeoffs, and the operational cost of scatter-gather queries.
- Consistency models - Why dual-write migrations create temporary inconsistency windows and how reconciliation jobs restore consistency guarantees.