๐Ÿ“HowToHLD
Vote for New Content
Vote for New Content
Home/High Level Design/System Design Questions

Design Instagram

Design Instagram's photo upload, hybrid fan-out feed, and CDN delivery for 500M DAU, covering the media pipeline and petabyte-scale Cassandra storage.

36 min read2026-03-28hardsystem-designsocial-mediamedia-storagefan-outcdn

What is Instagram?

Instagram is a photo-sharing social network where users upload images, follow other accounts, and scroll through a personalized feed of photos from people they follow. The apparent core is simple: upload a photo, show it to followers. The hard part is everything underneath.

Photos need resizing into multiple resolutions before delivery, durable storage at petabyte scale, and global serving under 100ms. The feed must merge photos from hundreds of followed accounts for 500 million daily users without touching a database on every scroll. I start every Instagram design by separating the upload write path from the fan-out read path, because they have nothing in common and mixing them creates the worst failure modes at scale.


Functional Requirements

Core Requirements

  1. Users can upload a photo with a caption.
  2. Users can follow and unfollow other users.
  3. Users can view their home feed: reverse-chronological photos from accounts they follow.
  4. Users can view a profile page: the photo grid of all posts by a specific user.

Below the Line (out of scope)

  • Engagement features (likes, comments, reactions) and content discovery (Explore, search)
  • Stories (24-hour ephemeral photos)
  • Reels (short-form video)
  • Direct messages

The hardest part in scope: Generating the home feed. At 500M DAU each refreshing their feed 10 times per day, assembling it on demand by querying across all followed users is not viable. Pre-computing feeds and delivering photos globally via CDN under 100ms is the axis on which the entire design turns.

Engagement features (likes, comments, reactions, Explore, and search) are below the line because they do not change the upload or feed delivery paths. To add likes, I would store a post_likes table keyed by (post_id, user_id) and cache the count in Redis per post. Search requires a separate Elasticsearch cluster consuming post creation events from Kafka for full-text caption indexing.

Stories are below the line because they introduce a separate ephemeral storage lifecycle and a dedicated stories feed that does not interact with the home feed pipeline. To add them, I would store story metadata with a 24-hour TTL in Redis and reuse the same S3 and CDN path for media delivery.

Reels introduce video transcoding, converting the upload pipeline into a multi-step encoding job. To add them, I would extend the async media processing pipeline (covered in deep dive 1) with a video transcoder and an Adaptive Bitrate (ABR) manifest generation step alongside the image resizing workers.

Direct messages require a separate real-time messaging system. To add them, I would use WebSocket connections through a dedicated chat service backed by a Cassandra message store, entirely separate from the feed and media systems.


Non-Functional Requirements

Core Requirements

  • Availability: 99.99% uptime. Availability over consistency for feeds: a feed missing the last 30 seconds of posts is acceptable; a failed feed load is not.
  • Durability: Photos are never lost. S3 provides 11-nines durability. No uploaded photo can be silently dropped or corrupted.
  • Latency: Home feed loads under 200ms p99. Photo delivery (the image bytes) completes under 100ms from any major geography. Upload acknowledgment completes under 1 second.
  • Scale: 2B registered users, 500M DAU. Approximately 100M photos uploaded per day (~1,160 uploads per second, peaking at ~3,500 per second during events).
  • Read throughput: Each active user loads their feed ~10 times per day. That is 5B feed loads per day, ~58K per second peaking at ~175K per second. Each feed load fetches 12 photos.

Below the Line

  • Sub-10ms photo delivery via CDN edge-node pre-warming
  • Real-time like-count consistency in feed

Read/write ratio: For every 1 photo uploaded, expect roughly 600 photo views (100M uploads per day vs 5B feed loads at 12 photos each). But the more important number is write amplification on the feed cache. With an average of 300 follows per active user, each photo upload triggers up to 300 feed cache writes. That is 30B feed-cache updates per 100M uploads per day. This fan-out multiplier, not the raw upload rate, drives the infrastructure decisions in this article.

I target 200ms p99 for feed loads and accept eventual consistency on the feed: a user missing the last 30 seconds of posts is a better outcome than a failed page load. That 200ms target rules out assembling the feed on the read path by querying the database for each followed user. The 1-second upload acknowledgment budget requires decoupling the media processing pipeline from the upload response, and the 99.99% availability target means no single-point-of-failure components in the hot read path.


Core Entities

  • Post: The core content entity. Carries a post_id, user_id, caption, media_keys (the S3 object keys for each processed resolution), media_status, and created_at. The media_keys are populated asynchronously after processing completes.
  • User: An account with a profile, follower_count, and following_count. The follower_count field drives the influencer threshold check in the fan-out strategy.
  • Follow: A directed edge from follower to followee. The follow graph is the input to every home feed generation and fan-out operation in the system.
  • Feed (derived): A pre-computed ordered list of post IDs cached per user. Not a stored entity.

The full schema, index strategy, and partition keys are deferred to the deep dives. The four entities above are sufficient to drive the API design and High-Level Design.


API Design

I use a two-phase upload rather than a multipart form POST to the app server because it keeps binary image bytes off the application fleet entirely, which is the right call at 3,500 uploads per second peak.

Upload a photo:

POST /posts
Body: { caption, media_type }
Response: { post_id, upload_url }

Acknowledge upload complete:

PUT /posts/{post_id}/media
Body: { upload_confirmed: true }
Response: 202 Accepted

Get home feed:

GET /feed/home
Query: { cursor?, limit? }
Response: { posts: [...], next_cursor }

Get profile posts:

GET /users/{user_id}/posts
Query: { cursor?, limit? }
Response: { posts: [...], next_cursor }

Follow a user:

POST /users/{user_id}/follows
Response: 201 Created

Unfollow a user:

DELETE /users/{user_id}/follows
Response: 204 No Content

Two-phase upload: Photo uploads use a two-phase pattern. The first POST /posts generates a pre-signed S3 URL and a post_id without touching media storage. The client uploads directly to S3 using the signed URL. The second PUT /posts/{id}/media signals the server that the upload is complete, triggering the async processing pipeline. This keeps large binary transfers off the application servers entirely.

Cursor pagination: All feed endpoints use cursor-based pagination rather than offset. A user's feed changes while they scroll as new posts arrive. Offset pagination skips or repeats posts when items are inserted at the top. A cursor encodes the last-seen post_id, and every subsequent page begins strictly after that ID.


High-Level Design

1. Users can upload a photo

The write path: client requests a pre-signed URL, uploads image bytes directly to S3, then confirms the upload. The Post Service never touches the image bytes.

Components:

  • Client: Mobile or web app initiating the two-phase upload flow.
  • Post Service: Validates the request, generates a post_id, issues a pre-signed S3 URL, inserts the post row with media_status = pending, and publishes a MediaUploadedEvent on confirmation.
  • Object Storage (S3): Receives the raw binary upload directly from the client.
  • Post DB: Stores the post metadata row. Media keys are populated asynchronously after processing.

Request walkthrough:

  1. Client sends POST /posts with caption and media type.
  2. Post Service generates a post_id and inserts { post_id, user_id, caption, media_status: "pending", created_at } into Post DB.
  3. Post Service generates a pre-signed S3 URL valid for 5 minutes and returns { post_id, upload_url }.
  4. Client uploads image bytes directly to S3 using the pre-signed URL.
  5. Client sends PUT /posts/{post_id}/media to confirm the upload is complete.
  6. Post Service publishes MediaUploadedEvent { post_id, s3_raw_key } to Kafka.
  7. Post Service returns 202 Accepted.
flowchart LR
  C(["๐Ÿ‘ค Client\nMobile / web app"])
  PS["โš™๏ธ Post Service\nGenerate post_id ยท pre-sign S3 URL\nINSERT post (pending)\nPublish MediaUploadedEvent on confirm"]
  S3[("๐Ÿ—„๏ธ S3 Object Storage\nRaw photo bytes\n11-nines durability\nDirect client upload")]
  PDB[("๐Ÿ—„๏ธ Post DB\npost_id ยท user_id ยท caption\nmedia_status = pending")]
  MQ["๐Ÿ“จ Kafka\nMediaUploadedEvent\nDecouples upload from processing\nAt-least-once delivery"]

  C -->|"POST /posts ยท caption"| PS
  PS -->|"INSERT post row (pending)"| PDB
  PS -->|"Returns post_id + pre-signed URL"| C
  C -->|"PUT image bytes (signed URL)"| S3
  C -->|"PUT /posts/{id}/media (confirm)"| PS
  PS -->|"Publish MediaUploadedEvent"| MQ

The media processing pipeline that resizes and optimizes the uploaded image is deferred to deep dive 1. Only the upload and acknowledgment path is shown here.


2. Users can view a profile page

I treat the profile page as the simpler read case before tackling the home feed merging problem. A database index on (user_id, post_id) is the only structure needed.

Components:

  • Post Service (updated): Serves profile page reads. Queries the Post DB using the user_id plus a cursor.
  • Post DB (updated): Index on (user_id, post_id) enables efficient per-user queries. Since post_id encodes creation time (Snowflake; covered in deep dive 4), this index gives chronological order without a separate timestamp index.

Request walkthrough:

  1. Client sends GET /users/{user_id}/posts?limit=12.
  2. Post Service queries: SELECT * FROM posts WHERE user_id = ? AND post_id < cursor ORDER BY post_id DESC LIMIT 12.
  3. Post Service returns the post list with a cursor encoding the last post_id.
flowchart LR
  C(["๐Ÿ‘ค Client\nMobile / web app"])
  PS["โš™๏ธ Post Service\nQuery posts by user_id\nCursor pagination on post_id"]
  PDB[("๐Ÿ—„๏ธ Post DB\nIndex on (user_id, post_id)\nSnowflake post_id encodes time")]
  S3[("๐Ÿ—„๏ธ S3 Object Storage\nProcessed photo sizes\nServed via CDN")]
  MQ["๐Ÿ“จ Kafka\nMediaUploadedEvent\n(from req 1 write path)"]

  C -->|"GET /users/{user_id}/posts?limit=12"| PS
  PS -->|"SELECT WHERE user_id = ? ORDER BY post_id DESC"| PDB
  PDB -->|"12 posts + next cursor"| PS
  PS -->|"{ posts, next_cursor }"| C
  PS -.->|"(write path) Publish MediaUploadedEvent"| MQ
  MQ -.->|"(write path) media stored here"| S3

Profile is a single-user read. Home feed requires merging posts across all followed accounts, which is the next two requirements.


3. Users can follow and unfollow other users

The follow graph powers every home feed. It must answer two questions fast: who do I follow (for reading my home feed) and who follows me (for fan-out when I post). Both directions must be O(1) per lookup.

Components:

  • Follow Service: Handles POST and DELETE on follow relationships. Writes both directions of the adjacency graph on every operation.
  • Follow Store: Keyed adjacency lists in both directions: follower_id โ†’ [followee_ids] and followee_id โ†’ [follower_ids].

Request walkthrough:

  1. Client sends POST /users/{followee_id}/follows.
  2. Follow Service writes (follower_id, followee_id) in the forward direction and (followee_id, follower_id) in the reverse direction into the Follow Store.
  3. Follow Service returns 201 Created.
flowchart LR
  C(["๐Ÿ‘ค Client\nMobile / web app"])
  PS["โš™๏ธ Post Service\nGenerate post_id ยท INSERT\nPublish MediaUploadedEvent"]
  PDB[("๐Ÿ—„๏ธ Post DB\nPosts by user_id ยท Snowflake post_id")]
  S3[("๐Ÿ—„๏ธ S3 Object Storage\nProcessed photo sizes\nServed via CDN")]
  MQ["๐Ÿ“จ Kafka\nMediaUploadedEvent\nAt-least-once delivery"]
  FS["โš™๏ธ Follow Service\nPOST creates follow edge\nDELETE removes follow edge\nMaintains both adjacency directions"]
  FDB[("๐Ÿ—„๏ธ Follow Store\nfollower_id โ†’ [followee_ids]\nfollowee_id โ†’ [follower_ids]\nBoth directions maintained")]

  C -->|"POST /posts"| PS
  PS -->|"INSERT post"| PDB
  PS -->|"Publish MediaUploadedEvent"| MQ
  MQ -.->|"(async) media stored here"| S3
  C -->|"POST /follows ยท DELETE /follows"| FS
  FS -->|"Write both adjacency directions"| FDB

Maintaining both directions doubles the write cost on follow and unfollow. The payoff is O(1) reads for the two access patterns that run on every post write and every feed load. Computing one direction from the other at query time would require a full-table scan across billions of edges.


4. Users can view their home feed

This is the hard requirement. Home feed must merge posts from every followed account, sorted by recency, and serve under 200ms. A user following 500 accounts cannot trigger 500 database queries per feed load. The feed must be pre-computed.

Components:

  • Feed Workers: Async workers consuming PostPublishedEvent events from Kafka and writing post_ids into each follower's feed cache. These fire after media processing is complete.
  • Kafka: Durable message queue decoupling post publication from fan-out.
  • Redis Feed Cache: Per-user sorted sets. Key: home_feed:{user_id}. Score: post creation timestamp. Value: post_id. Capped at 800 entries per user.
  • Feed Service (new): Handles all home feed reads. Fetches post_ids from Redis and hydrates them into full post objects.

Request walkthrough (write path):

  1. Media processing completes (deep dive 1). The Media Worker publishes PostPublishedEvent { post_id, author_id, created_at } to the post-ready Kafka topic.
  2. Feed Worker reads the event and fetches the author's follower list from Follow Store.
  3. Feed Worker calls ZADD home_feed:{follower_id} {timestamp} {post_id} for each follower and trims the sorted set to 800 entries.

Request walkthrough (read path):

  1. Client sends GET /feed/home.
  2. Feed Service calls ZREVRANGE home_feed:{user_id} 0 11 on Redis.
  3. Feed Service batch-fetches full post objects for the returned post_ids.
  4. Feed Service returns the assembled feed.
flowchart LR
  C(["๐Ÿ‘ค Client\nMobile / web app"])
  PS["โš™๏ธ Post Service\nINSERT post ยท Publish PostPublishedEvent\nFires after media_status = ready"]
  PDB[("๐Ÿ—„๏ธ Post DB\nSource of truth for post content\nIndex on (user_id, post_id)")]

  subgraph AsyncTier["๐Ÿ“จ Async Pipeline"]
    MQ["๐Ÿ“จ Kafka\nPostPublishedEvent ยท durable\nAt-least-once delivery"]
    FW["โš™๏ธ Feed Workers\nFetch follower list\nZADD to each feed ยท Trim to 800 entries"]
  end

  FDB[("๐Ÿ—„๏ธ Follow Store\nget_followers(author_id)")]
  RC["โšก Redis Feed Cache\nhome_feed:{user_id} sorted sets\nScore = timestamp ยท Value = post_id\n800 entries per user"]
  FSVC["โš™๏ธ Feed Service\nZREVRANGE from Redis\nHydrate post_ids to full post objects"]

  C -->|"POST /posts (upload)"| PS
  PS -->|"INSERT post"| PDB
  PS -->|"Publish PostPublishedEvent"| MQ
  MQ -->|"Consume event"| FW
  FW -->|"get_followers(author_id)"| FDB
  FDB -->|"List of follower_ids"| FW
  FW -->|"ZADD home_feed:{follower_id}"| RC
  C -->|"GET /feed/home"| FSVC
  FSVC -->|"ZREVRANGE home_feed:{user_id}"| RC
  FSVC -->|"Batch fetch post content"| PDB
  FSVC -->|"{ posts, next_cursor }"| C

This is the baseline: posts write through Kafka to pre-computed Redis feeds; home feed reads serve entirely from Redis. The Fan-out Worker's naive loop over all followers collapses for accounts with millions of followers, which we address in deep dive 2.

The Feed Worker as shown iterates over every follower for every post. An account with 5 million followers triggers 5 million Redis writes from one post. That loop stalls the fan-out pipeline for every other post queued behind it. Deep dive 2 addresses this directly with a hybrid strategy.


Potential Deep Dives

1. How does the media processing pipeline work?

Three constraints drive the design:

  • Uploaded photos arrive in arbitrary formats and sizes. The app must serve multiple resolutions (thumbnail, standard, high-res) matched to device capability and network conditions.
  • The upload experience must be fast. Users should not wait for processing to complete before seeing confirmation.
  • Image resizing is CPU-intensive. Running it inline on the upload server would block request-handling capacity at peak upload rates.

2. How do we generate home feeds at scale?

Three constraints define this problem:

  • Home feed loads must complete under 200ms p99.
  • A post from an account with millions of followers must not stall fan-out for all other users.
  • The write amplification from fan-out peaks at approximately 3,500 uploads/second times 300 average followers, equal to ~1.05M feed-cache writes per second at peak.

3. How do we serve photos to 500M users under 100ms?

Three constraints drive this:

  • Photos are large binaries (50KB to 5MB). Serving from a single origin region adds 150 to 400ms of round-trip latency for users far from the origin.
  • Popular photos may be requested millions of times per hour. Fetching each from S3 on every request is expensive and slow.
  • Photo URLs must not be guessable. A user should not be able to access another account's private photo by constructing a URL.

4. How do we store post metadata at scale?

Context: At 100M posts per day the post table accumulates over 36 billion rows per year. A single relational database cannot hold this dataset in memory or serve the two primary access patterns at scale: profile page queries (user_id + cursor) and post hydration by post_id from the feed cache.


Final Architecture

flowchart LR
  subgraph Clients["๐Ÿ‘ค Clients"]
    C(["๐Ÿ‘ค User\nMobile / Web"])
  end

  subgraph Gateway["๐Ÿ”€ Gateway"]
    AG["๐Ÿ”€ API Gateway\nAuth ยท Rate limit ยท Routing"]
  end

  subgraph AppTier["โš™๏ธ App Services"]
    PS["โš™๏ธ Post Service\nTwo-phase upload ยท pre-sign S3 URL\nPublish events on confirm"]
    FSVC["โš™๏ธ Feed Service\nFeed reads ยท signing CDN URLs\nHybrid influencer merge"]
    FSvc["โš™๏ธ Follow Service\nDual-write follow graph\nCache invalidation on change"]
  end

  subgraph AsyncTier["๐Ÿ“จ Async Pipeline"]
    KM["๐Ÿ“จ Kafka media-uploads\nMediaUploadedEvent ยท durable"]
    MW["โš™๏ธ Media Workers\nResize to 3 sizes ยท JPEG compression\nUpdate media_status = ready"]
    KP["๐Ÿ“จ Kafka post-ready\nPostPublishedEvent ยท fires after media ready"]
    FW["โš™๏ธ Feed Workers\nHybrid fan-out ยท skip influencers\nSkip inactive users ยท ZADD + trim"]
  end

  subgraph CacheTier["โšก Cache Tier"]
    RFC["โšก Redis Feed Cache\nhome_feed:{uid} sorted sets\n800 post_ids ยท regular accounts only"]
    FRC["โšก Redis Follow Cache\nfollowers:{uid} ยท 1h TTL\nInfluencer post_ids ยท 5m TTL"]
  end

  subgraph StorageTier["๐Ÿ—„๏ธ Storage"]
    CASS1[("๐Ÿ—„๏ธ Cassandra posts_by_user\nProfile page pattern\nPartition: user_id")]
    CASS2[("๐Ÿ—„๏ธ Cassandra posts_by_id\nFeed hydration ยท batch lookup\nPartition: post_id")]
    CDST[("๐Ÿ—„๏ธ Cassandra Follow Store\nBoth adjacency directions\n3-replica replication")]
    S3[("๐Ÿ—„๏ธ S3 Storage\nRaw + processed photo sizes\nOrigin pull via CDN shield")]
    CDN["๐ŸŒ CDN + Origin Shield\nSigned URLs ยท <20ms edge\nRequest collapsing to S3"]
  end

  C -->|"POST /posts ยท PUT /posts/{id}/media"| AG
  C -->|"GET /feed/home ยท GET /users/{id}/posts"| AG
  C -->|"POST /follows ยท DELETE /follows"| AG
  AG -->|"Upload ops"| PS
  AG -->|"Feed reads"| FSVC
  AG -->|"Follow ops"| FSvc
  PS -->|"INSERT posts_by_user"| CASS1
  PS -->|"INSERT posts_by_id"| CASS2
  PS -->|"Publish MediaUploadedEvent"| KM
  KM -->|"Consume"| MW
  MW -->|"GET raw bytes ยท PUT processed sizes"| S3
  MW -->|"UPDATE media_status = ready"| CASS2
  MW -->|"Publish PostPublishedEvent"| KP
  KP -->|"Consume"| FW
  FW -->|"get_followers (regular only)"| FRC
  FRC -.->|"Cache miss"| CDST
  FW -->|"ZADD home_feed:{uid}"| RFC
  FSvc -->|"Dual-write both directions"| CDST
  FSvc -->|"DEL followers:{uid}"| FRC
  FSVC -->|"ZREVRANGE home_feed:{uid}"| RFC
  FSVC -->|"get_influencer_following"| CDST
  FSVC -->|"Hydrate post_ids"| CASS2
  FSVC -->|"Profile page query"| CASS1
  FSVC -->|"Sign CDN URLs per response"| CDN
  C -->|"GET photo bytes (signed URL)"| CDN
  CDN -.->|"Cache miss"| S3

The media pipeline and feed pipeline run on separate Kafka topics and fire in sequence: a photo upload triggers media processing, and fan-out workers only fire after media_status = ready. This guarantees users never see a feed entry pointing to a photo that has not finished processing.


Interview Cheat Sheet

  • Start by stating the two parallel async pipelines: one for media processing (raw upload to resized S3 photos) and one for feed fan-out (post_ids into follower caches). They connect via the PostPublishedEvent published only after media is ready.
  • The upload acknowledgment takes under 50ms because the app server never handles image bytes. The client uploads directly to S3 via a pre-signed URL; the server only validates, inserts metadata, and publishes a Kafka event.
  • Media workers are stateless and horizontally scalable. Adding workers increases processing throughput without touching the upload request path. Adding a new output resolution is a config change, not a code deployment.
  • State the read/write asymmetry early: 100M uploads per day versus 5B feed loads per day fetching 12 photos each. The read path is the dominant workload by two orders of magnitude.
  • Fan-out on read breaks above roughly 100 followed accounts per user. Fan-out on write breaks for influencers above roughly 50K followers. The hybrid strategy splits cleanly at the influencer threshold and eliminates both failure modes.
  • At 500M DAU storing 800 pre-computed post_ids per user at 8 bytes each, the feed cache totals 3.2TB. Plan for Redis Cluster from the start.
  • Skip fan-out writes for inactive users. A Redis key last_active:{user_id} set on each login with a 30-day TTL is all the gate logic needed. Reconstruct the inactive user's feed from the Follow Store and Post DB on their next login.
  • Serve photos through a CDN with signed URLs, not directly from S3. Signed URLs have a 1-hour TTL; deleted posts stop being accessible within one TTL cycle without invalidating every URL ever issued.
  • CDN origin shield collapses all edge-PoP cache misses into a single S3 fetch per object per cache period. A viral photo is not fetched from S3 a million times; it is fetched roughly once per region per hour.
  • Use Cassandra with two tables (posts_by_user keyed by user_id and posts_by_id keyed by post_id) to serve both access patterns: profile page queries and feed hydration. Dual-write on every post creation.
  • Use LOCAL_QUORUM consistency for Cassandra writes and LOCAL_ONE for feed hydration reads. This balances durability guarantees against read latency on the hot path.
  • Snowflake IDs for post_id make ORDER BY post_id DESC equivalent to ORDER BY created_at DESC. No secondary timestamp index is needed for chronological profile or feed queries.
  • The fan-out ordering guarantee: never publish PostPublishedEvent before media_status = ready. A user should never see a feed entry for a photo that has not finished processing and is not yet available on the CDN.
  • Sign CDN URLs at API response time, not at upload time. Revoking access to a deleted post requires waiting at most one 1-hour TTL cycle, with no impact on any other URL ever issued.

Previous

Design Twitter / X

Next

Design a URL shortener

Comments

On This Page

What is Instagram?Functional RequirementsCore RequirementsBelow the Line (out of scope)Non-Functional RequirementsCore RequirementsBelow the LineCore EntitiesAPI DesignHigh-Level Design1. Users can upload a photo2. Users can view a profile page3. Users can follow and unfollow other users4. Users can view their home feedPotential Deep Dives1. How does the media processing pipeline work?2. How do we generate home feeds at scale?3. How do we serve photos to 500M users under 100ms?4. How do we store post metadata at scale?Final ArchitectureInterview Cheat Sheet