The system design interview framework

TL;DR

Every HLD interview fits a 6-phase structure: Functional Requirements, Non-Functional Requirements, API Design, Flow Design, High-Level Architecture, and Deep Dives. The phases are sequential, but you loop back as needed.
Time allocation matters more than depth. The single biggest mistake is jumping to boxes on a whiteboard before clarifying what you're building and how it behaves.
The interviewer evaluates your process, not just your architecture. A clean, structured walkthrough of a solid design beats a brilliant design delivered as a stream of consciousness.
Senior and staff candidates differentiate by driving the conversation: you set the scope, you define the APIs, you trace the flows, and you decide what to deep-dive.
DB design, schema choices, and indexing strategies belong in the Deep Dive phase. That's where you prove you can actually build what you drew.

Why Structure Matters More Than Knowledge

You're 8 minutes into a "Design Instagram" interview. You've drawn four boxes on the whiteboard. The interviewer asks: "How would you handle the feed?" You realize you never clarified whether this is a chronological feed or an algorithmic one. You never asked about the scale. You backtrack, erase, redraw.

The remaining 35 minutes become a scramble. You know caching, you know sharding, you know CDNs. But you're spending mental energy reorganizing instead of designing. The interviewer's notes say: "jumped ahead, disorganized, had to be redirected."

I've watched hundreds of candidates do this. The ones who pass aren't the ones who know the most. They're the ones who never lose control of the conversation. Structure is what gives you that control.

For your interview: treat the 45 minutes like a project you're managing. You're the tech lead running an architecture review, not a student answering questions. That mindset shift changes everything about how you present.

The 'I know this' trap

Experienced engineers often skip structure because they've built real systems. The reasoning: "I've done this at work, I'll just talk through it." But an interview is not a design review with your team. Your interviewer has no context. Without explicit structure, your deep knowledge comes across as scattered thinking. The framework isn't training wheels. It's the protocol that lets your expertise shine.

The 6-Phase Framework

Here's the structure that works for any system design question, at any level, at any company. The time splits assume a 45-minute interview (adjust proportionally for 30 or 60 minutes).

Six-phase framework showing Functional Requirements (3 min), Non-Functional Requirements (2 min), API Design (3 min), Flow Design (5 min), High-Level Architecture (17 min), and Deep Dives (15 min) as a left-to-right pipeline with key outputs at each phase — The 6-phase structure with time allocations. Notice how phases 1-4 together front-load a complete specification before you draw a single architecture box. That front-loading prevents the 'redraw at minute 20' disaster.

Phase	Time	Goal	Output
1. Functional Requirements	~3 min	Lock down what the system does	Bulleted list of core features + explicit out-of-scope list
2. Non-Functional Requirements	~2 min	Lock down how the system behaves	Scale numbers, latency targets, availability SLA, consistency model
3. API Design	~3 min	Define the contract	Key endpoints or interfaces with request/response shapes
4. Flow Design	~5 min	Trace the critical paths	Data flow for 2-3 core user actions
5. High-Level Architecture	~17 min	Build the system	Component diagram with labeled data flow
6. Deep Dives	~15 min	Prove depth	DB design, scaling strategy, failure modes for 2-3 components

These aren't rigid walls. You'll reference NFRs during the architecture phase. You'll adjust an API during a deep dive. But the sequence should be clear to your interviewer at all times.

I recommend literally saying this at the start:

"I'll begin by nailing down functional and non-functional requirements. Then I'll sketch the key APIs and trace the critical user flows. That'll set me up for the high-level architecture, and we'll deep-dive into the most interesting parts. I'll check in with you as we go."

That one sentence buys you enormous goodwill. It signals you have a plan.

Phase 1: Functional Requirements (3 minutes)

This phase exists for one reason: to prevent you from designing the wrong system.

Propose scope, don't ask for it

"Design Instagram" isn't one system. It's twenty. You need to narrow it down fast.

Bad: "What features should I include?" Good: "I'll focus on the core photo-sharing flow: upload, feed generation, and viewing. I'll leave stories, DMs, Reels, and search out of scope unless you'd like me to cover them."

See the difference? The first puts the burden on the interviewer. The second shows you understand the product and you're making a deliberate scoping decision. That's a staff-level signal.

My recommendation: list 3-5 functional requirements on the board and explicitly state what you're not building. The "not building" list matters as much as the "building" list. It shows you can manage scope, which is literally the job at senior+ levels.

What good functional requirements look like

For "Design Instagram":

Users can upload photos with captions
Users see a personalized feed of photos from accounts they follow
Users can like and comment on photos
Users can follow/unfollow other users

Out of scope: Stories, Reels, DMs, Search, Explore, Ads, Notifications

Four features. Four sentences. You've defined what you're building and what you're not. The interviewer knows exactly where this is going.

Phase 2: Non-Functional Requirements (2 minutes)

This is where most candidates are too vague. "It should be scalable and reliable" is meaningless. You need specific numbers and explicit choices.

The five NFRs that matter

Requirement	What to state	Why it drives design
Scale	"500M MAU, 10M DAU"	Drives sharding, caching, CDN decisions
Latency	"Feed loads under 200ms p99"	Determines whether you need a cache layer
Availability	"99.99% for reads, 99.9% for writes"	Determines replication and multi-region strategy
Consistency	"Feed: eventually consistent. Follows: strongly consistent"	Determines your consistency model per feature
Durability	"Zero data loss for uploaded photos"	Drives storage and backup strategy

Interview tip: anchor your numbers

Don't pull numbers from thin air. Say: "Instagram has roughly 2 billion MAU. For this design, I'll scope to 500M MAU and about 10M DAU, which gives us a 2% DAU/MAU ratio. That's conservative but keeps the math clean." Anchoring to real-world references shows you've studied the domain.

State the consistency model per feature, not globally

This is a nuance most candidates miss. "The system uses eventual consistency" is too broad. Different features have different consistency needs:

Feed: Eventually consistent (5-10 second staleness is invisible to users)
Like counts: Eventually consistent (slight delays are acceptable)
Follow/unfollow: Strongly consistent (unfollowing must take effect immediately)
Photo upload: Read-your-writes consistent (user must see their own upload right away)

Stating consistency per feature shows you understand that consistency is a spectrum, not a binary choice. That's a staff-level signal that costs you 15 extra seconds.

Phase 3: API Design (3 minutes)

This phase is where you define the contract between the client and the system. Skip it for purely internal backend questions, but for user-facing systems (which is 80%+ of interview questions), this phase is essential.

Why API design before architecture

Most candidates go straight from requirements to boxes. The problem: without defined APIs, you don't know what data flows in, what flows out, or what operations the system needs to support. You end up designing components and then backtracking to figure out the interfaces.

Defining APIs first forces you to think about the system from the consumer's perspective. What does the client actually need? What request and response shapes do they expect?

Answering these prevents over-engineering (building components for operations nobody calls) and under-engineering (missing endpoints the client needs).

How to do it in 3 minutes

You're not writing OpenAPI specs. You're sketching 3-5 key endpoints:

POST   /photos          → Upload a photo (image, caption, user_id)
GET    /feed/{user_id}  → Get personalized feed (cursor, page_size)
POST   /likes           → Like a photo (photo_id, user_id)
POST   /follows         → Follow a user (follower_id, followee_id)
DELETE /follows         → Unfollow (follower_id, followee_id)

That's it. Five lines. The interviewer can now see: (1) what the system does from the outside, (2) you thought about pagination (cursor-based, not offset), (3) you separated read and write paths, (4) you understand REST resource modeling.

When to skip or abbreviate

"Design the backend for X": Keep API design brief, focus on service-to-service interfaces
"How would you scale X?": Skip API design, go straight to architecture
"Design the data model for X": Skip API design, spend more time on schema and access patterns

For your interview: if the APIs are standard CRUD, sketch them in 60 seconds and move on. If the system has interesting API decisions (WebSocket vs. polling for real-time, GraphQL vs. REST for nested data), spend the full 3 minutes because the API choice itself becomes a talking point.

For protocol choice: default to REST for user-facing APIs, GraphQL when clients need flexible nested queries, and gRPC for internal service-to-service calls. State this in one sentence and move on.

REST vs. GraphQL vs. gRPC: when to pick which

REST: simple, well-understood, cacheable. Best for standard CRUD and public APIs. GraphQL: flexible queries with nested relationships (e.g., social graphs where the client decides the depth). gRPC: strongly typed, efficient binary protocol, supports streaming. Best for internal microservice communication where latency matters.

Phase 4: Flow Design (5 minutes)

This is the phase most frameworks skip, and it's the one that prevents the worst class of design mistakes: architectures that look good on paper but don't actually work when you trace a request through them.

What flow design means

Pick 2-3 core user actions from your functional requirements and trace the complete data path:

User does X on the client
Client sends a request to Y
Y does Z (reads from where? writes to where? calls what?)
Response flows back
User sees the result

Worked example: Instagram photo upload

User taps "Upload" on mobile app
  → Client compresses image, sends POST /photos with image + metadata
  → API Gateway authenticates, rate-limits, routes to Upload Service
  → Upload Service stores image in S3 (object storage), gets back a URL
  → Upload Service writes photo metadata (user_id, S3 URL, caption, timestamp) to DB
  → Upload Service publishes "new_photo" event to message queue
  → Feed Service consumes event, fans out post to followers' feed caches
  → Response: 201 Created with photo_id returned to client

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant US as Upload Service
    participant S3 as S3 Storage
    participant DB as Photo DB
    participant MQ as Message Queue
    participant FS as Feed Service

    C->>GW: POST /photos (image + metadata)
    GW->>US: Route (after auth + rate limit)
    US->>S3: Store compressed image
    S3-->>US: S3 URL
    US->>DB: Write metadata (user_id, URL, caption)
    US->>MQ: Publish "new_photo" event
    US-->>C: 201 Created (photo_id)
    Note over MQ,FS: Async fanout
    MQ->>FS: Consume event
    FS->>FS: Fan out to follower feed caches

Worked example: Instagram feed load

User opens app, triggers GET /feed/{user_id}?cursor=X
  → API Gateway routes to Feed Service
  → Feed Service checks Redis for pre-computed feed
  → Cache HIT: return cached feed items (photo_ids + metadata)
  → Cache MISS: query Feed DB for user's timeline, hydrate with photo metadata
  → For each feed item, fetch like counts from Counters Service (or cached)
  → Response: JSON array of feed items with pagination cursor

sequenceDiagram
    participant C as Client
    participant GW as API Gateway
    participant FS as Feed Service
    participant R as Redis Cache
    participant DB as Feed DB
    participant CS as Counters Service

    C->>GW: GET /feed/{user_id}?cursor=X
    GW->>FS: Route request
    FS->>R: Check pre-computed feed
    alt Cache HIT
        R-->>FS: Feed items
    else Cache MISS
        R-->>FS: Miss
        FS->>DB: Query timeline
        DB-->>FS: Raw feed items
    end
    FS->>CS: Fetch like counts (batched)
    CS-->>FS: Counts
    FS-->>C: JSON feed + pagination cursor

Those two flows, written on the board in 4-5 minutes, tell the interviewer everything: which services exist, what data store each one uses, which paths are synchronous vs. async, and where the caching layer sits.

For your interview: always trace the write path and the read path separately. They're almost always different architectures. Making this separation explicit is a strong signal.

Phase 5: High-Level Architecture (17 minutes)

This is the main event. You're building the system. But because you've done the first four phases, you already know the requirements, the APIs, and how data flows. Drawing becomes assembly, not invention.

The drawing sequence

Start with the user and draw left to right (or top to bottom):

Draw the client (mobile app, web browser, API consumer)
Draw the infrastructure layer (load balancer, API gateway, CDN)
Draw the core services (the 2-3 services that handle your functional requirements)
Draw the data stores (which database? cache? object storage?)
Draw the async paths (message queues, event buses for anything non-synchronous)
Label every arrow with what flows through it

Here's what a Phase 5 diagram might look like for our Instagram example:

flowchart TD
    subgraph Clients["👤 Clients"]
        Mobile(["📱 Mobile App"])
        Web(["🌐 Web App"])
    end

    subgraph Infra["🔀 Infrastructure"]
        CDN["⚡ CDN\nStatic assets + images"]
        LB["🔀 Load Balancer"]
        GW["🔒 API Gateway\nAuth · Rate limit"]
    end

    subgraph Services["⚙️ App Services"]
        US["⚙️ Upload Service"]
        FS["⚙️ Feed Service"]
        UserS["⚙️ User Service"]
    end

    subgraph Async["📨 Async"]
        MQ["📨 Kafka\nFeed fanout events"]
    end

    subgraph Data["🗄️ Data Stores"]
        S3[("📦 S3\nPhoto blobs")]
        PG[("🟢 PostgreSQL\nUsers · Follows")]
        Cass[("🗄️ Cassandra\nFeed timeline")]
        Redis["⚡ Redis\nFeed cache · Sessions"]
    end

    Mobile -->|"HTTPS"| CDN
    Web -->|"HTTPS"| CDN
    Mobile -->|"API calls"| LB
    Web -->|"API calls"| LB
    LB --> GW
    GW -->|"Upload"| US
    GW -->|"Feed read"| FS
    GW -->|"Profile"| UserS
    US -->|"Store image"| S3
    US -->|"Write metadata"| PG
    US -->|"Publish event"| MQ
    MQ -->|"Consume"| FS
    FS -->|"Cache read/write"| Redis
    FS -.->|"Cache miss"| Cass
    UserS -->|"Query"| PG

Narrate while drawing

Don't draw silently. Every box gets a one-sentence justification:

"I'm putting a load balancer here because we'll have multiple API servers to distribute traffic across."
"PostgreSQL for user profiles because we have relational queries and need ACID. Feed data in Cassandra because it's write-heavy and we can tolerate eventual consistency."
"Writes go through Kafka before hitting the feed cache because feed fanout at 50K writes/sec would overwhelm a synchronous write path."

Every component you add should be justified. If you can't explain why it's there in one sentence, you probably don't need it.

The overdesign trap

The most common mistake at the senior-to-staff transition: adding every component you know. Service mesh, circuit breakers, CQRS, event sourcing, Kafka, Redis, Cassandra, a CDN, a WAF, rate limiting... If your diagram looks like a cloud vendor's product catalog, you've lost the plot. Start simple. Add complexity only when a requirement forces it. The interviewer's follow-up question is your invitation to go deeper.

Justify with numbers, not instinct

When you add a component, connect it to your NFRs and quick estimates (see Capacity Planning for how to translate estimates into infrastructure decisions):

Component	When to add	One-sentence justification
Load Balancer	>1 app server (practically always)	"Distributes traffic and handles failover"
API Gateway	Multiple services need auth, rate limiting	"Centralizes cross-cutting concerns"
Cache (Redis)	Read-heavy workload, same data read repeatedly	"Absorbs read traffic, sub-ms latency"
Message Queue	Async processing, decoupled services	"Decouples write path from processing"
CDN	Static content, global users	"Serves static assets from edge, cuts latency"
Object Storage	Images, videos, files	"Cheap, durable, infinite scale for blobs"
Search Index	Full-text search, complex queries	"Elasticsearch for queries your DB can't handle"

Phase 6: Deep Dives (15 minutes)

You've drawn the architecture. The last 15 minutes are where you prove you can actually build the system, not just draw it.

What belongs in deep dives

This is where DB design, schema choices, and scaling strategy live. The architecture gave you the "what." Deep dives give you the "how."

Deep dive topic	What to cover
Database design	Table schemas, primary keys, indexes, access patterns, denormalization decisions
Scaling challenges	What breaks at 10x? Sharding strategy, partition keys, rebalancing approach
Caching strategy	What to cache, invalidation approach, TTL decisions, thundering herd mitigation
Consistency model	How you ensure the right consistency for each feature (strong vs. eventual)
Failure modes	What happens when component X goes down? Degradation strategy, recovery path

How to pick what to dive into

Two strategies:

Let the interviewer lead: "Which component would you like me to go deeper on?" Collaborative and safe.
Proactively dive into the hardest part: "The most interesting engineering challenge here is the feed generation pipeline. Let me walk through the database design and scaling approach." This is the staff-level move.

My recommendation: offer the choice first. If the interviewer says "you pick," choose the component with the most interesting trade-offs.

DB design in deep dives: what good looks like

When you deep-dive on a database design, cover these four things:

1. Schema (30 seconds):

CREATE TABLE photos (
  photo_id    BIGINT PRIMARY KEY,  -- Snowflake ID for sortability
  user_id     BIGINT NOT NULL,
  s3_url      TEXT NOT NULL,
  caption     TEXT,
  created_at  TIMESTAMP DEFAULT NOW()
);
-- PostgreSQL: create index separately
CREATE INDEX idx_user_created ON photos (user_id, created_at DESC);

2. Access patterns (30 seconds): "The primary read pattern is: get all photos by user_id, ordered by created_at descending. That's a range scan on the composite index. Secondary: get photo by photo_id (primary key lookup, O(1))."

3. Why this schema (30 seconds): "I chose a composite index on (user_id, created_at) because that's our hot query path. Putting created_at in descending order means the most recent photos are physically co-located on disk, which makes the range scan fast."

4. What changes at scale (30 seconds): "At 1B photos, this table is ~5TB. We'd partition by user_id range or hash. The index stays effective because all queries include user_id."

That's a complete DB deep dive in 2 minutes. Specific, justified, and scale-aware.

The depth expected at each level

Level	What they want to see
Mid-level	List components and explain what each does
Senior	Trace data flow, identify bottlenecks, propose solutions
Staff	Make and justify trade-off decisions, design DB schemas, identify failure modes
Principal	Build vs. buy decisions, system-level cost optimization, multi-year evolution path

Interview tip: the 'what if' technique

Proactively ask yourself "what if" questions out loud during deep dives. "What if this node goes down?" "What if traffic spikes 10x?" "What if we need to add a new feature?" Each "what if" demonstrates that you think about systems the way operators do.

Adapting the Framework

For 30-minute interviews

Compress Phases 1-4 into 5 minutes total. State requirements and APIs as assertions rather than discussions. Spend 12 minutes on architecture, 13 on deep dives.

For 60-minute interviews

Use the extra time for a second deep dive and more detailed API/flow design. Add a "Bottlenecks & Evolution" section: "Here's what breaks first as we grow from 10M to 100M users, and here's the migration path."

For different question types

Question type	Framework adjustment
"Design X" (Twitter, Uber)	Full 6-phase framework
"Design the backend for X"	Abbreviate API design, go deep on services and DB
"How would you scale X?"	Skip Phases 1-4, go straight to architecture evolution
"Design the data model for X"	Phases 1-2, then DB design as main event

How This Shows Up in Interviews

For the 15 most common ways candidates fail this framework, see Common Pitfalls.

The signals interviewers look for

At senior+ level, interviewers evaluate three meta-skills:

Ownership of the conversation: Do you drive, or do you wait to be told what to do?
Trade-off articulation: Can you name the alternatives and explain why you chose this one?
Scope management: Can you decide what's important and what to skip?

The framework gives you all three. You drive by announcing the phases. You articulate trade-offs at every decision point. You manage scope in Phase 1 by explicitly setting boundaries.

Common interviewer follow-ups

Interviewer asks	Strong answer
"Why did you choose that database?"	"PostgreSQL for user profiles: relational joins for the social graph, ACID for follow/unfollow. Cassandra for feed data: optimized for write throughput, and our feed reads are simple key lookups."
"What happens if this service goes down?"	"The system degrades gracefully. Users see a cached feed (up to 5 min stale). Writes queue in Kafka and replay on recovery. We page on-call if p99 latency exceeds 500ms."
"How would you handle 10x traffic?"	"The stateless app tier scales horizontally behind the LB. Cache absorbs read amplification. For writes, we add Kafka partitions. The DB is the last bottleneck: we'd shard by user_id when single-primary writes exceed capacity."
"What would you do differently with more time?"	"I'd add Elasticsearch for content search, implement rate limiting at the API gateway, design the notification pipeline, and add distributed tracing for observability."

Interview tip: the opening sentence that changes the dynamic

Open with: "Here's how I'd like to structure our time. I'll spend a few minutes on requirements and APIs, trace the key flows, then build the architecture, and we'll deep-dive into the hardest parts. Sound good?" This transforms you from "candidate being evaluated" to "engineer leading a design session."

Quick Recap

The 6-phase framework (Functional Requirements → Non-Functional Requirements → API Design → Flow Design → High-Level Architecture → Deep Dives) gives you a structured path through any system design interview.
Phases 1-4 together form a complete specification. Spending 13 minutes here prevents the "redraw at minute 20" disaster that sinks most interviews.
API design forces you to think from the consumer's perspective, catching missing requirements like idempotency, pagination, and real-time vs. polling.
Flow design catches async gaps, race conditions, and write/read path mismatches before you commit to architecture.
DB design, schema choices, and indexing strategies belong in Deep Dives, where you prove you can build what you drew.
The single highest-signal behavior: announcing your structure at the start, checking in after each phase, and adapting when the interviewer steers.

Estimation: The skill of rapid back-of-envelope math. Use estimation inside Phases 2 and 5 to justify component choices with numbers instead of instinct.
Capacity Planning: Translating estimates into concrete infrastructure decisions. Essential for Phase 5 when deciding how many servers, what cache size, and when to shard.
Common Pitfalls: A pre-flight checklist of the 15 mistakes that fail interviews. Review this the night before every interview to internalize the anti-patterns.