File Downloader
Design a production-grade file download service: walk through pre-signed URLs, HTTP Range requests, parallel multipart downloads, CDN offloading, and pause-resume state management across 10M concurrent clients.
What is a file download service?
A file download service delivers files from server storage to clients reliably and efficiently. The simplicity is deceptive: the real engineering challenge is serving arbitrary byte ranges from multi-gigabyte files concurrently across millions of connections, without flooding your origin storage or losing a 5 GB resume state when a mobile client drops off the network for 30 minutes. This question tests CDN architecture, HTTP Range requests, connection management, distributed session state, and how to separate the control plane (session management) from the data plane (moving bytes).
Functional Requirements
Core Requirements
- Users can initiate file downloads from web or mobile clients.
- Downloads can be paused and resumed; state persists across client restarts.
- The client reports real-time download progress (bytes received out of total).
- The system handles files from 1 KB to 50 GB.
Below the Line (out of scope)
- File upload pipeline
- Per-file digital rights management (DRM)
- Per-user download speed throttling
- Download scheduling and queue prioritization
The hardest part in scope: Serving byte-range requests for 50 GB files across 10 M concurrent connections without proxying bytes through application servers. The challenge sits at the intersection of CDN edge caching, Range request semantics, and distributed session state for pause-and-resume.
File upload is below the line because it is an independent write path with different reliability constraints (chunked upload, deduplication, virus scanning) that do not influence the download architecture.
DRM is below the line because it requires license server integration, per-device key management, and encrypted segment delivery. To add it, I would generate a time-limited DRM token alongside the pre-signed download URL and configure the CDN to enforce token validity before serving each segment.
Speed throttling is below the line because it adds significant CDN configuration complexity. To add it, I would configure per-client token bucket rate limits at the CDN edge. That configuration layer sits above the download path we are designing here.
Non-Functional Requirements
Core Requirements
- Availability: 99.99% uptime. Availability over strict consistency; a slightly stale progress count is far better than an unavailable download endpoint.
- Latency: First byte under 200 ms p99 for CDN-cached files; under 1 s for origin-fetched cold files.
- Throughput: 10 M concurrent downloads sustained; p99 single-client transfer speed at or above 50 Mbps.
- Integrity: Every delivered file must SHA-256-match the server-recorded checksum. Partial delivery must also be verifiable per chunk.
- Range support: Server must respond correctly to any
Range: bytes=X-Yrequest within a file, enabling pause-resume and parallel multi-part download.
Below the Line
- Geographic IP-based content restrictions
- Per-user bandwidth billing and metered egress quotas
- Real-time download analytics (top files, geographic heat maps)
Read/write ratio: For popular software packages or media files, the read-to-write ratio reaches 1,000,000:1 or more. A single OS release uploaded once can be downloaded by tens of millions of clients over days. The entire design is a read optimization problem. Every architectural decision traces back to this skew: how do we move bytes from origin to client without the origin seeing most of the traffic?
Under 200 ms first-byte latency for cached files demands CDN edge placement within one network hop of the client. The 10 M concurrent download target rules out proxying bytes through application servers entirely: even ignoring bandwidth, each open TCP connection parks a goroutine or thread, and 10 M goroutines is not a fleet you want to maintain.
Core Entities
- File: The downloadable artifact. Carries
file_id,name,size_bytes,content_type,storage_key(S3 object key),checksum(SHA-256 of full file), andcreated_at. - DownloadSession: Tracks pause-and-resume state. Carries
session_id,file_id,client_id,bytes_confirmed(last checkpointed byte offset),total_bytes,status(active, paused, completed, expired),download_url, andexpires_at. - FileChunk: Addressable sub-range of a large file (introduced in the integrity deep dive). Carries
chunk_index,file_id,start_byte,end_byte, and a per-chunkchecksum.
Full schema with indexes and access patterns is deferred to the data model section of the deep dives. These three entities are sufficient to drive the API design and High-Level Design from here.
API Design
FR 1 - Initiate a download:
The naive instinct is a direct GET that streams the full file:
GET /files/{file_id}
Response: 200 OK, Content-Type: application/octet-stream, binary body
This breaks immediately at scale: the API server proxies all bytes, consuming one thread or goroutine per active connection. At 10 M concurrent downloads, that is 10 M goroutines before accounting for bandwidth.
The evolved shape separates intent from delivery. The client tells the API what it wants; the API returns a URL pointing directly to the CDN:
POST /downloads
Body: { file_id, client_id? }
Response: { session_id, download_url, file_size, checksum, content_type, expires_at }
The download_url is a pre-signed URL pointing to the CDN or S3 directly. The client fetches from that URL without touching the application server again:
GET {download_url}
Headers: Range: bytes=0- (optional; omit for full file)
Response: 200 OK
Accept-Ranges: bytes
Content-Disposition: attachment; filename="large-file.zip"
Content-Length: 10737418240
206 Partial Content (when Range header present)
Content-Range: bytes 104857600-10737418239/10737418240
Accept-Ranges: bytes
POST over GET for session creation: creating a DownloadSession is a write with a persistent side effect. POST is correct. The pre-signed URL encodes the authorization; the CDN validates it at the edge on each range request without calling back to the application server.
FR 2 - Pause and resume:
PATCH /downloads/{session_id}
Body: { status: "paused" | "resumed", bytes_confirmed: 104857600 }
Response: { session_id, status, bytes_confirmed, download_url }
bytes_confirmed is the last byte offset the client has fully received and verified. On resume, the client issues a Range: bytes={bytes_confirmed}- request against the download_url.
GET /downloads/{session_id}
Response: { session_id, file_id, status, bytes_confirmed, total_bytes, download_url, expires_at }
GET fetches current session state when a client restarts and needs to know where it left off.
FR 3 - Progress:
Progress is primarily tracked client-side: the download client counts bytes received in memory and renders a progress bar locally. Server-side progress is useful for server-to-server transfers or multi-device syncing:
GET /downloads/{session_id}/progress
Response: { bytes_confirmed, total_bytes, percentage, status, transfer_rate_bps }
Authorization note: All endpoints assume a client token in the Authorization header. The pre-signed download URL embeds authorization in its HMAC signature; the CDN validates it without a round-trip to the API server. If authentication is added to scope, I would associate each session with a user_id from the session token and enforce it on POST /downloads.
High-Level Design
1. Initiating a download - the naive proxy path
The simplest possible design streams bytes through the API server itself.
Components:
- Client: Web or mobile app sending
POST /downloads, then fetching the file. - API Server: Creates a session, fetches the file from S3, and streams bytes back to the client.
- Sessions DB: Stores download session rows (session_id, file_id, bytes_confirmed, status).
- S3 / Object Storage: Stores the raw file bytes.
Request walkthrough:
- Client POSTs to
/downloadswithfile_id. - API Server creates a session row in the Sessions DB.
- API Server opens a GET to S3 for the full file.
- API Server streams S3 response bytes directly back to the client.
- Client accumulates bytes to disk.
This is the write path and byte delivery path combined. The API server here carries all download bandwidth.
What breaks: At 10 M concurrent downloads at 50 Mbps each, the API tier must sustain 500 Tbps of egress. No application server fleet handles that at reasonable cost. Each connection parks a goroutine for the duration of a multi-hour large file download. S3 charges per API call and per-GB egress; every client download hits S3 directly.
2. Evolved - pre-signed URL and CDN offloading
The fix decouples the control plane (deciding what to download and authorizing it) from the data plane (actually moving bytes).
Components:
- Client: Sends
POST /downloadsonce to get a pre-signed URL, then fetches from CDN directly. Never talks to the API server for bytes. - API Server (control plane only): Creates sessions, generates pre-signed URLs, handles pause/resume PATCH calls. Touches zero bandwidth.
- CDN (data plane): Serves file bytes directly to clients. Caches popular files at edge PoPs. Validates pre-signed URL HMAC signatures inline. Handles Range requests natively.
- Origin Storage (S3): Backing store. CDN pulls from here only on a cache miss.
- Sessions DB: Stores session state including
bytes_confirmedfor resume.
Request walkthrough:
- Client POSTs to
/downloadswithfile_id. - API Server looks up file metadata (size, checksum, storage_key) in the File DB.
- API Server creates a DownloadSession row in the Sessions DB.
- API Server generates a pre-signed URL pointing to the CDN (expiry + HMAC signature in query params).
- API Server returns
{ session_id, download_url, file_size, checksum }. - Client GETs the
download_urldirectly from the CDN, bypassing the API server entirely. - CDN validates the URL signature at the edge, checks its cache, and serves bytes from cache or pulls once from S3.
- Client receives bytes from the nearest CDN PoP.
The API server is now a session management service, not a byte pipe. It handles thousands of control calls per second (cheap), while the CDN handles millions of concurrent byte streams (exactly what CDNs are designed for). I'd call this architectural separation out early in your interview because it frames every subsequent decision.
3. Pause and resume - Range request mechanics
Pause-and-resume is HTTP Range requests applied to session state. When a client pauses at byte 104,857,600 (100 MB into a 10 GB file), it records that checkpoint in the session. On resume, it issues a Range: bytes=104857600- request against the same download_url.
Components added:
- DownloadSession state machine: Transitions from
activetopausedback toactive, thencompletedorexpired. - Range request to CDN: Resume is a GET with
Range: bytes={bytes_confirmed}-on the originaldownload_url. The CDN serves from the checkpointed offset, not from byte 0.
Pause walkthrough:
- Client decides to pause (user action or network drop).
- Client sends
PATCH /downloads/{session_id}with{ status: "paused", bytes_confirmed: 104857600 }. - API Server updates the session row.
- Client closes the CDN connection.
Resume walkthrough:
- Client restarts and loads the saved
session_id. - Client sends
GET /downloads/{session_id}to retrievedownload_urlandbytes_confirmed. - If the URL has expired (403 from CDN), client calls
POST /downloadsfor a fresh pre-signed URL, resuming from the server-storedbytes_confirmed. - Client GETs the
download_urlwith headerRange: bytes=104857600-. - CDN responds
206 Partial ContentwithContent-Range: bytes 104857600-10737418239/10737418240. 5b. If the CDN returns416 Range Not Satisfiable,bytes_confirmedexceeds the current file size (the file was likely replaced). The client treats a 416 identically to a 403: create a fresh session (POST /downloads) and restart the download from byte 0. - Client continues accumulating bytes from offset 104,857,600.
One critical point to raise in your interview: pause-and-resume requires both the origin storage and the CDN to support Range requests. S3 supports them natively. CDN support for partial content caching (caching the 206 response for a byte range) requires explicit configuration; not all CDN providers cache partial responses by default.
4. Large file support - parallel multi-part download
A single sequential Range request saturates one TCP connection. For a 10 GB file over a 100 Mbps link, that means roughly a 14-minute download. Modern download managers split the file into N parallel chunks, each fetched independently via its own Range request, then assembled in order at the client.
Components added:
- Client-side download manager: Determines the chunk count and byte boundaries, issues N parallel Range requests, tracks per-chunk completion, assembles the final file.
- CDN Range responses: The CDN responds correctly to each independent Range request, potentially serving slices from its partial object cache.
Parallel download walkthrough:
- Client gets
file_size = 10737418240(10 GB) from the session response. - Client chooses N=8 parallel parts. Chunk size = 10 GB / 8 = 1.25 GB.
- Client issues 8 simultaneous GETs with Range headers.
- CDN responds with
206 Partial Contentfor each part independently. - Client writes each part to its own temp segment on disk.
- After all 8 parts complete, client assembles segments in order and validates the full-file SHA-256.
Each Range request is handled by the CDN independently. For popular files, the CDN caches byte slices and serves them from memory without touching S3. I'll treat the chunk management algorithm as a black box here and detail it in the parallel download deep dive.
Potential Deep Dives
1. How do we support pause and resume that survives client crashes?
Three questions sit under this: where is the resume offset stored, how precisely is it tracked, and how does the client recover after an unexpected shutdown?
2. How do we support parallel multi-part downloads for large files?
Parallel multi-part download can reduce download time for a large file by a factor of N on a bandwidth-limited path. But naively implemented it stalls on a single slow part.
3. How do we scale to 10 M concurrent downloads?
4. How do we cache popular files at CDN edge to reduce origin load?
5. How do we ensure file integrity across chunks and devices?
Final Architecture
Interview Cheat Sheet
- The core framing: This is a read-dominated system (up to 1,000,000:1 read-to-write for popular files). The entire design is a read optimization problem: move bytes as close to the client as possible, and keep origin invisible to most of the traffic.
- Pre-signed URLs are the key architectural insight: They separate authorization (control plane) from byte delivery (data plane). The API server authorizes once; the CDN delivers bytes without calling back to the application server. This is what allows 10 M concurrent downloads without 10 M goroutines on your API fleet.
- Range requests are not a special protocol: Pause-and-resume is just
Range: bytes=X-issued from the last confirmed byte offset. The server must includeAccept-Ranges: bytesin responses and handle any byte range correctly. - Client confirms receipt, server stores it: Resumable downloads require server-side
bytes_confirmedstorage. Client-only storage fails on cross-device resume and loses state on app crash. The server is the authoritative source of resume offset. - Parallel multi-part with work-stealing: Fixed N parts suffer from stragglers. Use a concurrent work queue of 10 MB chunks with N worker goroutines that steal from the queue. Slow workers process fewer chunks; fast workers take more. Validate each chunk with CRC32 immediately; re-enqueue failed chunks without restarting the full download.
- CDN cache key normalization is mandatory: Strip
expiresandsigquery params from the CDN cache key. Otherwise every unique pre-signed URL is a cache miss and effective hit rate is 0% even for the most popular files. - Partial object caching: Enable at CDN so byte range responses (
206) are cached as independent slices. Edge PoPs build coverage of popular file byte ranges over time without needing to cache the full file. - Two-tier CDN (edge + shield): Edge PoPs are small and fast with high eviction rates. Shield PoP is large and medium-latency, catching edge misses before they hit S3. Files larger than 10 GB skip edge PoP cache entirely and route through shield directly.
- Session TTL and URL expiry: Session TTL = 72 hours. Pre-signed URL TTL = 12-24 hours (1-4 hours for sensitive content). A 403 mid-download means the URL expired; client calls
POST /downloadsfor a fresh URL and resumes frombytes_confirmed. A 416 Range Not Satisfiable meansbytes_confirmedexceeds the current file size (file was replaced); treat it like a 403 and restart from byte 0. - Integrity tradeoff: Per-chunk SHA-256 for operational integrity (catches accidental corruption). Full-file SHA-256 as final gate check. Merkle tree for adversarial chain-of-custody (mention in senior/staff interviews; per-chunk checksum over TLS is sufficient for most practical deployments).
- Numbers to remember: 10 M concurrent ร 50 Mbps = 500 Tbps aggregate egress (impossible without CDN). 50 GB / 10 MB = 5,120 chunks, 160 KB chunk manifest. HMAC validation at CDN edge: ~100 ns, not a latency bottleneck.
- Avoid saying: "stream bytes through the API server" or "cache the full file at each edge PoP" (large files kill edge caches). Never use CDN-side
bytes_sentas the resume offset (sent does not equal received). Never forget to normalize the CDN cache key (pre-signed URL params break caching).