Design Instagram
Design Instagram's photo upload, hybrid fan-out feed, and CDN delivery for 500M DAU, covering the media pipeline and petabyte-scale Cassandra storage.
What is Instagram?
Instagram is a photo-sharing social network where users upload images, follow other accounts, and scroll through a personalized feed of photos from people they follow. The apparent core is simple: upload a photo, show it to followers. The hard part is everything underneath.
Photos need resizing into multiple resolutions before delivery, durable storage at petabyte scale, and global serving under 100ms. The feed must merge photos from hundreds of followed accounts for 500 million daily users without touching a database on every scroll. I start every Instagram design by separating the upload write path from the fan-out read path, because they have nothing in common and mixing them creates the worst failure modes at scale. That separation is the single highest-leverage decision in this entire design.
Functional Requirements
Core Requirements
- Users can upload a photo with a caption.
- Users can follow and unfollow other users.
- Users can view their home feed: reverse-chronological photos from accounts they follow.
- Users can view a profile page: the photo grid of all posts by a specific user.
Below the Line (out of scope)
- Engagement features (likes, comments, reactions) and content discovery (Explore, search)
- Stories (24-hour ephemeral photos)
- Reels (short-form video)
- Direct messages
The hardest part in scope: Generating the home feed. At 500M DAU each refreshing their feed 10 times per day, assembling it on demand by querying across all followed users is not viable. Pre-computing feeds and delivering photos globally via CDN under 100ms is the axis on which the entire design turns.
Engagement features (likes, comments, reactions, Explore, and search) are below the line because they do not change the upload or feed delivery paths. To add likes, I would store a post_likes table keyed by (post_id, user_id) and cache the count in Redis per post. Search requires a separate Elasticsearch cluster consuming post creation events from Kafka for full-text caption indexing.
Stories are below the line because they introduce a separate ephemeral storage lifecycle and a dedicated stories feed that does not interact with the home feed pipeline. To add them, I would store story metadata with a 24-hour TTL in Redis and reuse the same S3 and CDN path for media delivery.
Reels introduce video transcoding, converting the upload pipeline into a multi-step encoding job. To add them, I would extend the async media processing pipeline (covered in deep dive 1) with a video transcoder and an Adaptive Bitrate (ABR) manifest generation step alongside the image resizing workers.
Direct messages require a separate real-time messaging system. To add them, I would use WebSocket connections through a dedicated chat service backed by a Cassandra message store, entirely separate from the feed and media systems.
Non-Functional Requirements
Core Requirements
- Availability: 99.99% uptime. Availability over consistency for feeds: a feed missing the last 30 seconds of posts is acceptable; a failed feed load is not.
- Durability: Photos are never lost. S3 provides 11-nines durability. No uploaded photo can be silently dropped or corrupted.
- Latency: Home feed loads under 200ms p99. Photo delivery (the image bytes) completes under 100ms from any major geography. Upload acknowledgment completes under 1 second.
- Scale: 2B registered users, 500M DAU. Approximately 100M photos uploaded per day (~1,160 uploads per second, peaking at ~3,500 per second during events).
- Read throughput: Each active user loads their feed ~10 times per day. That is 5B feed loads per day, ~58K per second peaking at ~175K per second. Each feed load fetches 12 photos.
Below the Line
- Sub-10ms photo delivery via CDN edge-node pre-warming
- Real-time like-count consistency in feed
Read/write ratio: For every 1 photo uploaded, expect roughly 600 photo views (100M uploads per day vs 5B feed loads at 12 photos each). But the more important number is write amplification on the feed cache. With an average of 300 follows per active user, each photo upload triggers up to 300 feed cache writes. That is 30B feed-cache updates per 100M uploads per day. This fan-out multiplier, not the raw upload rate, drives the infrastructure decisions in this article.
I target 200ms p99 for feed loads and accept eventual consistency on the feed: a user missing the last 30 seconds of posts is a better outcome than a failed page load. That 200ms target rules out assembling the feed on the read path by querying the database for each followed user. The 1-second upload acknowledgment budget requires decoupling the media processing pipeline from the upload response, and the 99.99% availability target means no single-point-of-failure components in the hot read path.
Core Entities
- Post: The core content entity. Carries a
post_id,user_id,caption,media_keys(the S3 object keys for each processed resolution),media_status, andcreated_at. Themedia_keysare populated asynchronously after processing completes. - User: An account with a profile,
follower_count, andfollowing_count. Thefollower_countfield drives the influencer threshold check in the fan-out strategy. - Follow: A directed edge from follower to followee. The follow graph is the input to every home feed generation and fan-out operation in the system.
- Feed (derived): A pre-computed ordered list of post IDs cached per user. Not a stored entity.
The full schema, index strategy, and partition keys are deferred to the deep dives. The four entities above are sufficient to drive the API design and High-Level Design.
API Design
I use a two-phase upload rather than a multipart form POST to the app server because it keeps binary image bytes off the application fleet entirely, which is the right call at 3,500 uploads per second peak. I cannot stress this enough: letting the app server proxy image bytes is the most common mistake I see in Instagram design interviews.
Upload a photo:
POST /posts
Body: { caption, media_type }
Response: { post_id, upload_url }
Acknowledge upload complete:
PUT /posts/{post_id}/media
Body: { upload_confirmed: true }
Response: 202 Accepted
Get home feed:
GET /feed/home
Query: { cursor?, limit? }
Response: { posts: [...], next_cursor }
Get profile posts:
GET /users/{user_id}/posts
Query: { cursor?, limit? }
Response: { posts: [...], next_cursor }
Follow a user:
POST /users/{user_id}/follows
Response: 201 Created
Unfollow a user:
DELETE /users/{user_id}/follows
Response: 204 No Content
Two-phase upload: Photo uploads use a two-phase pattern. The first
POST /postsgenerates a pre-signed S3 URL and a post_id without touching media storage. The client uploads directly to S3 using the signed URL. The secondPUT /posts/{id}/mediasignals the server that the upload is complete, triggering the async processing pipeline. This keeps large binary transfers off the application servers entirely.
Cursor pagination: All feed endpoints use cursor-based pagination rather than offset. A user's feed changes while they scroll as new posts arrive. Offset pagination skips or repeats posts when items are inserted at the top. A cursor encodes the last-seen post_id, and every subsequent page begins strictly after that ID.
High-Level Design
1. Users can upload a photo
The write path: client requests a pre-signed URL, uploads image bytes directly to S3, then confirms the upload. The Post Service never touches the image bytes.
Components:
- Client: Mobile or web app initiating the two-phase upload flow.
- Post Service: Validates the request, generates a post_id, issues a pre-signed S3 URL, inserts the post row with
media_status = pending, and publishes aMediaUploadedEventon confirmation. - Object Storage (S3): Receives the raw binary upload directly from the client.
- Post DB: Stores the post metadata row. Media keys are populated asynchronously after processing.
Request walkthrough:
- Client sends
POST /postswith caption and media type. - Post Service generates a post_id and inserts
{ post_id, user_id, caption, media_status: "pending", created_at }into Post DB. - Post Service generates a pre-signed S3 URL valid for 5 minutes and returns
{ post_id, upload_url }. - Client uploads image bytes directly to S3 using the pre-signed URL.
- Client sends
PUT /posts/{post_id}/mediato confirm the upload is complete. - Post Service publishes
MediaUploadedEvent { post_id, s3_raw_key }to Kafka. - Post Service returns
202 Accepted.
The media processing pipeline that resizes and optimizes the uploaded image is deferred to deep dive 1. Only the upload and acknowledgment path is shown here.
2. Users can view a profile page
I treat the profile page as the simpler read case before tackling the home feed merging problem. In an interview, solve this one first to build confidence before the harder fan-out discussion. A database index on (user_id, post_id) is the only structure needed.
Components:
- Post Service (updated): Serves profile page reads. Queries the Post DB using the user_id plus a cursor.
- Post DB (updated): Index on
(user_id, post_id)enables efficient per-user queries. Since post_id encodes creation time (Snowflake; covered in deep dive 4), this index gives chronological order without a separate timestamp index.
Request walkthrough:
- Client sends
GET /users/{user_id}/posts?limit=12. - Post Service queries:
SELECT * FROM posts WHERE user_id = ? AND post_id < cursor ORDER BY post_id DESC LIMIT 12. - Post Service returns the post list with a cursor encoding the last post_id.
Profile is a single-user read. Home feed requires merging posts across all followed accounts, which is the next two requirements.
3. Users can follow and unfollow other users
The follow graph powers every home feed. It must answer two questions fast: who do I follow (for reading my home feed) and who follows me (for fan-out when I post). Both directions must be O(1) per lookup.
Components:
- Follow Service: Handles
POSTandDELETEon follow relationships. Writes both directions of the adjacency graph on every operation. - Follow Store: Keyed adjacency lists in both directions:
follower_id β [followee_ids]andfollowee_id β [follower_ids].
Request walkthrough:
- Client sends
POST /users/{followee_id}/follows. - Follow Service writes
(follower_id, followee_id)in the forward direction and(followee_id, follower_id)in the reverse direction into the Follow Store. - Follow Service returns 201 Created.
Maintaining both directions doubles the write cost on follow and unfollow. The payoff is O(1) reads for the two access patterns that run on every post write and every feed load. Computing one direction from the other at query time would require a full-table scan across billions of edges.
4. Users can view their home feed
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.