Netflix
Walk through a complete Netflix video streaming design: from encoding one upload into 1,200 adaptive variants to serving 100M concurrent viewers through ISP-embedded CDN appliances with sub-2-second start times.
What is Netflix?
Netflix is a video streaming platform with 220 million subscribers across 190 countries. The interesting engineering challenge is not storing petabytes of video; S3 handles that. It is encoding each uploaded title into 1,200 adaptive format variants, distributing those variants to ISP-embedded appliances so viewers never touch the origin, and selecting the right variant every 4 seconds to keep a 4G phone and a 4K TV both buffer-free on the same film.
This is a rich interview question because it spans four genuinely hard problems: a parallel video encoding pipeline, a proprietary CDN tier called Open Connect, an adaptive bitrate protocol running on the client, and a recommendation system that personalizes a catalog of 15,000 titles for 220 million distinct users.
Functional Requirements
Core Requirements
- Users can browse and search the video catalog.
- Users can stream a video with adaptive quality that does not buffer.
- Users receive personalized content recommendations.
- Users can resume watching from where they left off on any device.
Below the Line (out of scope)
- Content upload and studio ingestion
- Offline download for mobile
- Multi-profile management and parental controls
- Live broadcasting and event streaming
The hardest part in scope: Serving 100 million concurrent viewers without buffering. Every viewer fetches a video segment every 4 seconds. The origin (S3) cannot absorb that read volume directly. The solution is layered: ISP-embedded appliances serve the vast majority of traffic within each city, a tiered CDN catches the rest, and the origin only handles cache misses. Getting that hierarchy right is the core of this design.
Content upload is below the line because it is a write-once, offline pipeline that does not interact with the playback path. To add it, I would build a studio-facing ingest API that accepts high-resolution source files, deposits them in S3, and emits an EncodeVideoEvent to Kafka. The encoding pipeline covered in the deep dives would pick it up from there.
Offline downloads are below the line because they require DRM key management (Netflix uses Widevine, PlayReady, and FairPlay per platform). The download itself is a CDN prefetch operation, but the license server integration adds substantial complexity that changes none of the streaming design. To add it: integrate a Download License Service that issues time-bounded offline playback tokens per device and content item. The CDN segment prefetch path already exists; the missing piece is a client-side encrypted storage mechanism and the License Service issuing short-lived download keys scoped to that device.
Multi-profile management is below the line because it is a user account feature orthogonal to the streaming path. Each profile is a child entity on the User record with its own recommendation scores, watch history, and content restrictions. To add it: extend the User entity with a Profile child and scope all WatchSession, recommendation cache keys, and entitlement checks to profile_id instead of user_id. No streaming path components change.
Live broadcasting is below the line because it uses a fundamentally different delivery protocol (RTMP or SRT ingest, LL-HLS or WebRTC egress) with sub-second latency constraints that override the VOD caching model entirely. The system here is VOD-optimized. To add it: build a separate live ingest pipeline (RTMP or SRT ingest, segmented into LL-HLS chunks at under 2 seconds) and route through a dedicated low-latency CDN path. Live segments expire immediately and must not be cached aggressively; the standard OCA pre-population model does not apply.
Non-Functional Requirements
Core Requirements
- Availability: 99.99% uptime. Netflix is entertainment, and outages are front-page news. Availability over consistency: a stale recommendation is acceptable; a playback failure is not.
- Latency: Video starts within 2 seconds of clicking play (time to first byte for the first segment). Catalog browse and search under 200ms p99.
- Scale: 220M subscribers, roughly 100M DAU. Netflix sustains roughly 15% of global downstream internet bandwidth at peak hours. Peak concurrent streams are roughly 30 to 50M at any given moment.
- Buffering rate: Under 0.1% of playback time. Viewers tolerate minor quality drops; they do not tolerate buffering spinners.
- Storage: Each title generates roughly 1TB of encoded variants (multiple resolutions, codecs, audio tracks). At 15,000 titles, total catalog storage is roughly 15 petabytes.
Below the Line
- Sub-50ms start time for high-speed fixed broadband connections
- Per-language subtitle and audio track management at encoding time
- Device-specific DRM enforcement and key rotation
Sub-50ms start time is below the line because achieving it requires speculative segment prefetch before the viewer clicks play, which demands an ML pipeline predicting next-play candidates from browse behavior and pre-warming OCA edges proactively. To add it: instrument browse dwell-time events, train a lightweight next-play predictor, and issue CDN prefetch directives for the predicted first 3 segments.
Per-language subtitle and audio track encoding is below the line because it runs on a parallel track to video encoding and adds no complexity to the delivery path described here. To add it: route each language audio and subtitle file through the same Kafka job queue with a dedicated worker pool; the Job Coordinator assembles all tracks into the final HLS manifest on completion.
Device-specific DRM enforcement is below the line because it requires integrating three separate license servers (Widevine for Android and Chrome, PlayReady for Windows, FairPlay for Apple). To add it: route the manifest endpoint through a License Service that issues short-lived playback tokens per session and device type. The manifest endpoint already returns a drm_license_url in the API design; the License Service backs that URL.
Read/write ratio: This is among the most read-skewed systems in existence. Each new title write generates 1,200+ encoding jobs that produce the variants. After encoding, reads are effectively permanent: every segment request from every viewer is a read, and content rarely changes after publication. The read-to-write ratio for serving is thousands-to-one. This justifies aggressive caching at every tier of the delivery chain.
I target 2-second start time as the key latency number. The first video segment is typically 2 to 4 seconds of content. The manifest file tells the client player where to find each segment. Fetching the manifest plus the first segment from an ISP-embedded appliance takes under 500ms on most fixed broadband connections, leaving margin for player initialization and DRM license fetch.
Core Entities
- Video: A piece of content with metadata (title, genre, cast, description, rating) and a reference to its set of encoded variants in S3.
- EncodingVariant: A specific rendition of a video at a given resolution, bitrate, and codec (H.264 1080p at 8 Mbps, H.265 4K at 16 Mbps, etc.). Each variant maps to a directory of segment files and a manifest in S3.
- User: Account with a subscription tier (Standard, 4K), regional content entitlements, and a reference to their recommendation scores.
- WatchSession: A viewer's current playback state: last position in milliseconds, device ID, and selected quality profile.
Schema design, partition keys, and indexes are deferred to the deep dives. The four entities above are sufficient to drive the API design and High-Level Design.
API Design
Netflix uses REST for catalog and session operations and a manifest-driven protocol (HLS/DASH) for video delivery. The manifest is the critical contract between the server and the client player.
Browse the catalog:
GET /catalog?page_token={cursor}&limit=50
Response: { items: [...], next_cursor }
Search titles:
GET /search?q={query}&page_token={cursor}&limit=20
Response: { results: [...], next_cursor }
Get video manifest URL:
GET /videos/{video_id}/manifest
Response: { manifest_url, drm_license_url }
Update watch position:
PUT /watch-sessions/{session_id}/position
Body: { position_ms, device_id }
Response: 204 No Content
Fetch recommendations:
GET /recommendations?limit=20
Response: { videos: [...] }
Fetch resume position:
GET /videos/{video_id}/resume-position
Response: { position_ms, device_id, updated_at }
The manifest endpoint returns a URL pointing directly to the manifest file on the CDN, not the manifest content itself. The client player fetches the manifest from the CDN edge, which is geographically close and already has the file cached. Returning the manifest inline through the API server would route large payloads through application servers that have no reason to see them.
Cursor-based pagination on
/catalogand/searchis required rather than offset-based. With 15,000 titles and frequent metadata updates, offset-based pages drift as titles are added mid-browse. Cursors are stable across inserts.
High-Level Design
1. Users can browse and search the video catalog
The catalog read path: the client fetches paginated metadata from the Catalog Service, which reads from a PostgreSQL read replica with a Redis cache in front for hot content.
The catalog is a read-dominated, low-write workload. A film's metadata rarely changes after publication. Cache hit rates are high because a small set of popular titles drives the majority of browse impressions.
I'd draw this box first in any Netflix interview because it is the simplest path in the system and establishes the gateway-service-cache-DB pattern that every subsequent requirement builds on.
Components:
- Client: Web or mobile app sending paginated GET requests to the API Gateway.
- API Gateway: Terminates TLS, validates the session token, and routes requests to downstream services.
- Catalog Service: Stateless service that reads video metadata and assembles the paginated response.
- Metadata Cache (Redis): LRU cache keyed by
video_id. TTL of 1 hour. Cache hit rate exceeds 95% for catalog browse because users repeatedly see the same popular titles. - Metadata DB (PostgreSQL): Source of truth for all video metadata. A read replica handles all catalog read traffic. The primary handles metadata writes only.
Request walkthrough:
- Client sends
GET /catalog?page_token={cursor}&limit=50. - API Gateway validates the Bearer token and routes to the Catalog Service.
- Catalog Service checks Redis for cached metadata entries. Returns cached items directly for hits.
- On cache miss, Catalog Service queries the PostgreSQL read replica and populates Redis before returning.
- Catalog Service returns the paginated list with a
next_cursor.
This diagram covers catalog reads only. The streaming path in the next requirement uses a separate CDN delivery flow that bypasses the Catalog Service entirely.
2. Users can stream a video with adaptive quality
The streaming path: the client fetches a manifest URL from the Manifest Service, then fetches video segments directly from the CDN edge closest to it. The origin (S3) only serves requests that miss every CDN tier.
Netflix does not stream video through application servers. The API server's only role is to hand the client a manifest URL and a DRM license URL. All bytes of video travel through the CDN. I'd state this separation explicitly in the interview because it is the single most important insight: the API layer and the video delivery layer are completely decoupled.
Components:
- Manifest Service: Returns the CDN URL of the manifest file for a given
video_id. Validates subscription tier to determine which quality variants are accessible. - CDN (Open Connect Appliances): ISP-embedded file servers that serve the vast majority of traffic from inside the viewer's ISP network. Fallback CDN (CloudFront) handles remaining traffic.
- S3 (Origin): Source of truth for all encoded video segments and manifest files. The CDN pulls from S3 on cache miss. Viewers never access S3 directly.
Request walkthrough:
- Client sends
GET /videos/{video_id}/manifest. - Manifest Service validates the session and subscription tier. Returns
{ manifest_url, drm_license_url }. - Client player fetches the manifest from the CDN edge. The manifest lists all available variant playlists at multiple bitrates.
- Client ABR algorithm selects the starting variant (default: medium quality for fast start).
- Client fetches 4-second video segments from the CDN. Every 4 seconds the ABR algorithm re-evaluates throughput and switches variants if needed.
- On segment cache miss: the CDN edge pulls the segment from S3 and caches it for subsequent viewers on the same node.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.