Dropbox
Walk through a complete Dropbox design: content-addressed chunking for delta sync, conflict copy resolution, and petabyte-scale chunk deduplication for 500M users.
What is Dropbox?
Dropbox is a cloud file-sync service: upload a file on one device and it appears on every other device you own within seconds. Three hard problems hide behind that simplicity: avoiding re-upload of unchanged file content on every edit, preserving both versions when two devices edit the same file offline, and deduplicating 500 petabytes of data where most bytes are identical across users. Content-addressed chunking is the architectural mechanism that solves all three at once.
This makes Dropbox a rich interview question. It touches content addressing and chunked storage, sync conflict resolution in an eventually-consistent system, hierarchical metadata modeling, and at-scale notification delivery. I'd recommend leading with the chunking insight early in the interview, because every other design decision (sync, dedup, versioning) falls out of it naturally.
Functional Requirements
Core Requirements
- Users can upload files from any device.
- Files automatically sync to all of a user's connected devices after an upload.
- Users can share folders and collaborate with other users.
- Users can view and restore previous versions of files.
Below the Line (out of scope)
- Public file sharing via link
- Real-time collaborative editing (that's Google Docs territory)
- Advanced team permissions, audit logs, and compliance reporting
- Media preview and thumbnail generation
The hardest part in scope: Efficiently syncing large files without re-uploading them in full on every edit. The mechanism behind this, content-addressed chunking with client-side delta computation, is the architectural keystone of the entire system.
Public link sharing is below the line because it introduces its own access-control and abuse-prevention surface without changing the upload or sync architecture. To add it, I would generate a short-lived signed token for the file, store it in Redis with a configurable TTL, and serve downloads via a CDN edge with token validation. The file data path is identical to authenticated download.
Real-time collaborative editing is below the line because it requires operational transformation or CRDT logic at the application layer. Dropbox is a sync system, not an editor. Integrating collaborative editing would effectively mean building Google Docs alongside Dropbox.
Advanced team permissions are below the line because they introduce a role and policy evaluation tree that sits on top of, but does not change, the underlying file storage and sync architecture. To add them, I would replace the simple folder_members table with a full RBAC model and evaluate policies at the API Gateway level on every request.
Media thumbnails are below the line because they are a read-side enhancement. To add them, I would trigger an asynchronous job on each confirmed upload that generates thumbnails and stores them in S3 under a predictable key, then serve them via CDN with a separate GET endpoint.
Non-Functional Requirements
Core Requirements
- Durability: Files must never be lost. 11-nines durability, matching S3's guarantee for underlying chunk storage.
- Availability: 99.99% uptime. Availability over consistency: a sync delay of a few seconds is acceptable; losing a file or a confirmed upload is not.
- Consistency: Online devices receive sync notifications within 5 seconds of a change. Offline devices sync on reconnect.
- Latency: Upload begins streaming within 1 second of a file change being detected. Chunk uploads are parallel with no queuing at the application tier.
- Scale: 500M users, 100M DAU. Average 1GB stored per user = 500 PB total. 5 file changes per DAU per day = 500M changes/day = ~5,800 writes/second at average load, peaking at 3x (~17,000/second) during business hours. Max file size: 5GB.
Below the Line
- Sub-100ms sync latency via edge routing
- HIPAA-compliant encryption key management per user
Read/write ratio: Roughly 5:1 reads to writes. Users download and open files far more often than they upload them. Reads are further amplified by multi-device sync: a single upload triggers downloads on all of a user's other devices. Write throughput is the harder engineering constraint at 17,000 peak writes/second; read throughput is handled by CDN caching of S3 chunks.
I target 5-second end-to-end sync for online devices. A filesystem event propagates in under 100ms, chunk hashing takes 1-2 seconds for a typical file, and the network upload of only changed chunks takes the bulk of that budget. The 5-second window is generous but bounded by client-side hashing time, not server-side architecture. I'd state this latency budget upfront in an interview because it immediately tells the interviewer you understand where the bottleneck actually lives.
Core Entities
- File: Metadata record for a file:
file_id,name,owner_id,folder_id,size,content_hash(SHA-256 of the full file),created_at,updated_at,current_version_id. - Chunk: A fixed-size (4MB) block of file data identified by its SHA-256 hash. Multiple files may reference the same chunk if they share content. Chunk data lives in S3; the registry lives in PostgreSQL.
- Folder: A container for files and sub-folders. Has
folder_id,name,parent_folder_id,owner_id, and a materialisedpathstring encoding ancestor folder IDs. - FileVersion: Every confirmed upload creates a new
FileVersionrecord linking to the chunk manifest at that point in time. Restoring a version is a metadata operation, not a data movement. - User: Account with
user_id, device list, and storage quota. Schema details are deferred to deep dive 4.
Schema details, indexes, and partition key design are covered in the metadata storage deep dive. The five entities above are sufficient to drive the API design and High-Level Design.
API Design
Start with one endpoint per functional requirement, then evolve where the naive shape breaks down.
FR 1: Upload a file (naive):
POST /files
Body: { name, folder_id, size, file_bytes }
Response: { file_id }
FR 1: Download a file:
GET /files/{file_id}
Response: { file_id, name, size, download_url }
These two endpoints satisfy the upload requirement on paper, but POST /files with raw bytes means the Upload Service must receive every byte of every file. For a 5GB file at scale, that is a bandwidth bottleneck that cannot be horizontally scaled away (you would need to buffer the entire file before writing to S3). The upload endpoint must be redesigned as a three-step protocol: declare the manifest, upload chunks directly to S3, confirm.
FR 1 (evolved): Declare chunk manifest:
POST /files
Body: { name, folder_id, size, content_hash, chunks: [{ index, chunk_hash }] }
Response: { file_id, missing_chunks: [chunk_hash], upload_urls: { chunk_hash: presigned_url } }
FR 1 (evolved): Upload a chunk directly to S3:
PUT {presigned_url}
Body: raw chunk bytes
Response: HTTP 200
FR 1 (evolved): Confirm upload complete:
PUT /files/{file_id}/confirm
Body: { chunks: [{ index, chunk_hash }], device_last_sync_token }
Response: { file_id, version_id }
Why three steps instead of one: The manifest step lets the server identify which chunks are already stored before any bytes transfer. The pre-signed URL step offloads bulk data transfer from the Upload Service directly to S3, so the Upload Service only handles lightweight metadata coordination regardless of file size. The confirm step is the authoritative write: metadata and version records are only committed once all chunks are durably in S3.
device_last_sync_token in confirm: The timestamp of the last sync event the uploading device processed. The server compares it to
file.updated_atto detect whether another device modified the file while this device was offline. If so, a conflict copy is created rather than overwriting. Covered in deep dive 2.
FR 2: Poll for sync changes (long-poll):
GET /sync/changes?cursor={timestamp}&timeout=30
Response: { changes: [{ file_id, event_type, version_id }], new_cursor }
The cursor is the updated_at timestamp of the last delivered change. Long-poll with a 30-second timeout means the connection hangs open until an event fires or the timeout expires, then the client immediately re-connects. This avoids a persistent WebSocket per device while still delivering changes within seconds.
FR 3: List folder contents:
GET /folders/{folder_id}/contents
Response: { files: [...], folders: [...] }
FR 3: Share a folder:
POST /folders/{folder_id}/members
Body: { user_id, permission: "view" | "edit" }
Response: { share_id }
FR 4: List file versions:
GET /files/{file_id}/versions
Response: { versions: [{ version_id, size, created_at, created_by_device }] }
FR 4: Restore a version:
POST /files/{file_id}/restore/{version_id}
Response: { file_id, version_id }
Restore is a POST, not a PUT, because it creates a new version record pointing to the historical chunk manifest. No file bytes move; the metadata pointer changes. The response triggers a FileChangedEvent that syncs all connected devices.
High-Level Design
1. Users can upload files
Start simple: the client detects a file change, reads the file, and POSTs the raw bytes to the Upload Service, which writes them to S3 and records metadata in PostgreSQL.
Components:
- Client Sync Engine: Monitors the filesystem for changes and POSTs the full file bytes to the Upload Service.
- Upload Service: Receives file bytes, writes them to S3 under a generated key, records metadata in PostgreSQL.
- Object Store (S3): Stores file data, one object per upload.
- Metadata DB (PostgreSQL): Stores file records:
file_id,name,owner_id,folder_id,size,s3_key.
Request walkthrough:
- Client detects a modified file via filesystem watcher.
- Client reads the entire file.
- Client POSTs the raw bytes to
POST /files. - Upload Service writes the bytes to S3 under a generated key.
- Upload Service inserts file metadata into PostgreSQL and returns
file_id.
This works for small files. It breaks immediately on the scale NFR: editing one paragraph of a 1GB file re-uploads 1GB. At 100M DAU making 5 edits per day on 50MB average files, that is 25 PB of daily upload traffic, most of it unchanged bytes. The Upload Service becomes a bandwidth bottleneck routing every byte through itself, and two users uploading the same Node.js installer store two full copies with zero sharing.
I always start with this naive design in interviews, not because it's viable, but because the specific way it breaks is what motivates every subsequent design decision.
Evolving the design: content-addressed chunking
The key insight is that most of a file's content does not change between edits. If each block of bytes is identified by its SHA-256 hash, the server can answer "which blocks are new?" without reading any file content (just check whether the hash is in the registry). The client then uploads only the missing blocks, in parallel, directly to S3.
Components (evolved):
- Client Sync Engine: Splits modified files into 4MB chunks, computes SHA-256 per chunk, and orchestrates a three-step upload: declare the manifest, upload only missing chunks directly to S3, confirm.
- Upload Service: Receives the chunk manifest, queries the Chunk Registry for missing hashes, returns pre-signed S3 PUT URLs for missing chunks, and writes metadata on confirm. Never handles raw chunk bytes.
- Chunk Registry (PostgreSQL): Maps
chunk_hash โ s3_keywith a reference count. O(1) existence check per hash. - Object Store (S3): Stores immutable content-addressed chunk data keyed by SHA-256 hash. Shared across users and versions.
- Metadata DB (PostgreSQL): Stores file and folder records, version history, and the chunk-to-file mapping.
Request walkthrough (evolved):
- Client detects a new or modified file via filesystem watcher.
- Client splits the file into 4MB chunks and computes SHA-256 for each chunk.
- Client sends
POST /fileswith the full chunk manifest (list of hashes). - Upload Service queries the Chunk Registry:
SELECT chunk_hash FROM chunks WHERE chunk_hash IN (?). - Upload Service returns
missing_chunksand pre-signed S3 PUT URLs for each missing chunk. - Client uploads missing chunks in parallel directly to S3 using the pre-signed URLs.
- Client sends
PUT /files/{file_id}/confirmonce all parallel uploads complete. - Upload Service writes file metadata and chunk references to PostgreSQL and publishes a
FileChangedEvent.
The client uploads chunks directly to S3. The Upload Service handles only lightweight metadata coordination, not bulk data transfer. At 5GB max file size and 4MB chunks, that means the Upload Service processes at most 1,280 hash lookups per upload, each a point-read O(1) query.
I always write to PostgreSQL before publishing the change event. If those two operations were reversed, a device could receive a sync notification for a file whose metadata doesn't exist yet in the DB.
2. Files automatically sync to all connected devices
The sync path: after a confirmed upload, the Upload Service publishes a FileChangedEvent; the Notification Service delivers it to all of the owner's waiting long-poll connections; each device downloads only the chunks it does not have locally.
When Device A uploads a modified file, all other devices owned by the same user must be notified and download the delta. This requires a server-to-client push mechanism.
Components:
- Notification Service: Holds long-poll HTTP connections for each connected client device. On receiving a
FileChangedEventfrom Redis Pub/Sub, it responds immediately to all waiting connections for that user. - Change Pipeline (Redis Pub/Sub): Carries
FileChangedEvent { user_id, file_id, version_id }from the Upload Service to the Notification Service. Channel keyed byuser_id. - Download Service: Given a
file_id, returns the chunk manifest and pre-signed S3 read URLs for each chunk. Clients call this after receiving a change notification.
Request walkthrough:
- Upload Service confirms the upload and publishes
FileChangedEvent { user_id, file_id, version_id }touser:{user_id}Redis channel. - Notification Service (subscribed to
user:{user_id}) receives the event. - Notification Service responds to all long-poll connections waiting on that user's change feed (excluding the originating device, identified by
Device-Idheader). - Each syncing device calls
GET /files/{file_id}to retrieve the new chunk manifest. - Device compares the manifest against its local chunk cache: missing hashes = chunks to download.
- Device downloads missing chunks in parallel via pre-signed S3 read URLs.
Long-polling over WebSockets is the right tradeoff here. File changes are infrequent: a device might sync 5 times per day, so persistent WebSocket connections waste server file descriptors for almost-never-fired events. Long-polling adds at most 1-5 seconds of extra latency in the worst case, within the 5-second sync NFR. I've seen teams default to WebSockets for everything and then struggle with connection management at 100M devices. Ask yourself how often events actually fire before choosing your push mechanism.
3. Users can share folders with collaborators
The sharing path: a share record is written to the folder_members table; on every upload to a shared folder, the Upload Service fans out FileChangedEvents to all collaborators' user IDs alongside the owner's.
Sharing multiplies the sync fan-out: a file uploaded to a shared folder must notify not just the owner's devices but all collaborators' devices.
Components:
- Share Service: Validates the share request and writes a row to the
folder_memberstable:{ folder_id, user_id, permission }. - Upload Service (updated): On upload confirm, queries
folder_membersfor the file's folder and publishes oneFileChangedEventper collaboratoruser_idto Redis Pub/Sub. - Metadata DB (updated):
folder_memberstable with index on(folder_id)for fast fan-out lookup and(user_id)for "show me all shared folders" queries.
Request walkthrough (sharing):
- Owner sends
POST /folders/{folder_id}/memberswith{ user_id, permission: "edit" }. - Share Service: validates the owner's permission, inserts a row into
folder_members. - Next time any collaborator uploads a file to that folder, Upload Service queries
SELECT user_id FROM folder_members WHERE folder_id = ?. - Upload Service publishes
FileChangedEventfor each returneduser_id(owner plus all collaborators). - Each user's Notification Service delivers the event to their connected devices.
For folders shared with up to 100 users, the inline SELECT and sequential pub/sub loop completes well within latency budget. Beyond 100 collaborators (shared team folders), the same hybrid fan-out pattern from WhatsApp group messaging applies: publish a single event to Kafka and fan out asynchronously via workers. That threshold is a deep dive concern.
4. Users can view and restore previous file versions
The version path: every confirmed upload inserts a FileVersion row linked to the current chunk manifest; restoring a version is a metadata-only operation that creates a new FileVersion pointing to the historical chunk set.
Version history is purely a metadata concern. The chunks themselves are already immutable in S3 and are referenced by the FileVersion record. Restoring a version does not move any data.
Components:
- Version Service: Lists
FileVersionrecords for a file and handles restore requests by creating a new version pointing to the historical chunk manifest, then publishing a change event to trigger sync. - Metadata DB (updated):
file_versionstable:version_id,file_id,chunk_manifest,created_by_device,created_at,size.
Request walkthrough (restore):
- Client sends
GET /files/{file_id}/versionsto list all saved versions. - Client selects a historical version and sends
POST /files/{file_id}/restore/{version_id}. - Version Service reads the chunk manifest from the historical
file_versionsrow. - Version Service inserts a new
FileVersionrow pointing to the restored chunk manifest. No data moves. - Version Service updates the file's
current_version_idin thefilestable. - Version Service publishes
FileChangedEvent, triggering sync on all devices.
Restoring a version creates zero new S3 uploads. The historical chunks already exist with ref_count greater than zero because the old version record still references them. Version retention is simply keeping old FileVersion rows alive in PostgreSQL.
Potential Deep Dives
1. How do we efficiently sync large files?
Three constraints define the sync efficiency problem:
- A 1GB file edited in one paragraph should not require re-uploading 1GB.
- Delta computation must work without the server reading file content.
- Multiple missing chunks must be uploadable in parallel.
2. How do we handle sync conflicts?
Two constraints define the conflict problem:
- Two devices can edit the same file while offline. When they reconnect, one edit would overwrite the other without warning.
- Conflict resolution must not require a distributed lock (that would make offline editing impossible).
3. How do we store and deduplicate petabytes of file data?
Three access patterns drive storage requirements:
- Write at scale: ~5,800 chunk writes per second at average load, ~17,000 at peak.
- Deduplication check: For every upload, determine which chunk hashes already exist.
- Read performance: Chunk reads dominate; most content is served via CDN-cached S3 pre-signed URLs.
4. How do we design metadata storage for billions of files?
Three access patterns require fast queries:
- Folder listing: Return all direct children of a folder, sorted by name or date modified.
- Sync change feed: Given a cursor timestamp, return all file changes for a user after that point.
- Folder hierarchy traversal: Resolve ancestors and descendants for rename operations and deep browsing.
Final Architecture
Content-addressed chunking is the architectural keystone. It delivers delta sync so modified files upload only changed 4MB blocks, cross-user deduplication so identical chunks are stored once across all 500M users, and parallel direct-to-S3 uploads that bypass the application tier entirely.
PostgreSQL sharded by owner_id handles hierarchical file metadata with index-only single-level reads. Redis Pub/Sub fans out FileChangedEvents to all connected devices within milliseconds of a confirmed upload, keeping sync latency well within the 5-second NFR without a persistent WebSocket per device.
Interview Cheat Sheet
- Start by separating file metadata (what files exist, where, what versions) from file content (raw chunk bytes). These are two different storage problems with different scale, durability, and access pattern requirements.
- The most important design decision is content-addressed chunking. Files are split into 4MB chunks keyed by SHA-256. The server checks hashes, not file content. Delta sync, deduplication, and parallel uploads all fall out of this single decision.
- Delta sync is a consequence of chunking, not a separate feature. Editing one paragraph of a 1GB file changes at most 2 of 256 chunks. The client uploads 8MB instead of 1GB: a 128x reduction.
- Cross-user deduplication is free. Two users with the same installer upload the same chunk hashes. The second user gets back 0 missing chunks and transfers zero bytes.
- Clients upload chunks directly to S3 via pre-signed URLs. The Upload Service handles only metadata coordination, never bulk bytes. At full scale, raw data transfer never touches the application tier.
- Long-polling for sync notifications, not WebSockets. File changes happen a few times per day per user, not in real time. Long-polling adds at most 1-5 seconds of extra latency (within the 5-second NFR) while consuming far fewer server resources than persistent WebSocket connections at this scale.
- Conflict resolution: do not reject the second upload and do not silently overwrite. Create a conflict copy in the same folder with the device name and date in the filename. Both versions survive; the user resolves.
- The conflict signal is
device_last_sync_tokencompared tofile.updated_at. If the file was modified after the device's last sync, conflict copy. No distributed lock needed. - PostgreSQL for file metadata with adjacency list for folder hierarchy. Single-level listing uses
WHERE folder_id = ?: an index scan, not a LIKE. Folder rename touches one row because path encodes folder IDs, not names. - Sync change feed is
SELECT WHERE owner_id = ? AND updated_at > cursor ORDER BY updated_at LIMIT 100. One range scan on a composite index. The cursor is a timestamp stored per device. - Chunk GC is reference counted. Increment on upload confirm, decrement on version deletion, delete from S3 when ref_count hits zero. Use
INSERT ... ON CONFLICT DO UPDATEfor atomic upsert. Never SELECT then INSERT. - Shard PostgreSQL by
owner_id. Every query is already scoped to one user, so no cross-shard queries for normal operations. - Storage tiering reduces cost by roughly 70% for inactive data: Standard for hot chunks, Glacier after 90 days, Deep Archive after 2 years. Attach lifecycle rules to S3 based on
last_accessed_atin the Chunk Registry.