The system design interview framework
A 6-phase framework for any system design interview: requirements, NFRs, APIs, flows, architecture, and deep dives, with time splits for each.
TL;DR
- Every HLD interview fits a 6-phase structure: Functional Requirements, Non-Functional Requirements, API Design, Flow Design, High-Level Architecture, and Deep Dives. The phases are sequential, but you loop back as needed.
- Time allocation matters more than depth. The single biggest mistake is jumping to boxes on a whiteboard before clarifying what you're building and how it behaves.
- The interviewer evaluates your process, not just your architecture. A clean, structured walkthrough of a solid design beats a brilliant design delivered as a stream of consciousness.
- Senior and staff candidates differentiate by driving the conversation: you set the scope, you define the APIs, you trace the flows, and you decide what to deep-dive.
- DB design, schema choices, and indexing strategies belong in the Deep Dive phase. That's where you prove you can actually build what you drew.
Why Structure Matters More Than Knowledge
You're 8 minutes into a "Design Instagram" interview. You've drawn four boxes on the whiteboard. The interviewer asks: "How would you handle the feed?" You realize you never clarified whether this is a chronological feed or an algorithmic one. You never asked about the scale. You backtrack, erase, redraw.
The remaining 35 minutes become a scramble. You know caching, you know sharding, you know CDNs. But you're spending mental energy reorganizing instead of designing. The interviewer's notes say: "jumped ahead, disorganized, had to be redirected."
I've watched hundreds of candidates do this. The ones who pass aren't the ones who know the most. They're the ones who never lose control of the conversation. Structure is what gives you that control.
For your interview: treat the 45 minutes like a project you're managing. You're the tech lead running an architecture review, not a student answering questions. That mindset shift changes everything about how you present.
The 'I know this' trap
Experienced engineers often skip structure because they've built real systems. The reasoning: "I've done this at work, I'll just talk through it." But an interview is not a design review with your team. Your interviewer has no context. Without explicit structure, your deep knowledge comes across as scattered thinking. The framework isn't training wheels. It's the protocol that lets your expertise shine.
The 6-Phase Framework
Here's the structure that works for any system design question, at any level, at any company. The time splits assume a 45-minute interview (adjust proportionally for 30 or 60 minutes).
| Phase | Time | Goal | Output |
|---|---|---|---|
| 1. Functional Requirements | ~3 min | Lock down what the system does | Bulleted list of core features + explicit out-of-scope list |
| 2. Non-Functional Requirements | ~2 min | Lock down how the system behaves | Scale numbers, latency targets, availability SLA, consistency model |
| 3. API Design | ~3 min | Define the contract | Key endpoints or interfaces with request/response shapes |
| 4. Flow Design | ~5 min | Trace the critical paths | Data flow for 2-3 core user actions |
| 5. High-Level Architecture | ~17 min | Build the system | Component diagram with labeled data flow |
| 6. Deep Dives | ~15 min | Prove depth | DB design, scaling strategy, failure modes for 2-3 components |
These aren't rigid walls. You'll reference NFRs during the architecture phase. You'll adjust an API during a deep dive. But the sequence should be clear to your interviewer at all times.
I recommend literally saying this at the start:
"I'll begin by nailing down functional and non-functional requirements. Then I'll sketch the key APIs and trace the critical user flows. That'll set me up for the high-level architecture, and we'll deep-dive into the most interesting parts. I'll check in with you as we go."
That one sentence buys you enormous goodwill. It signals you have a plan.
Phase 1: Functional Requirements (3 minutes)
This phase exists for one reason: to prevent you from designing the wrong system.
Propose scope, don't ask for it
"Design Instagram" isn't one system. It's twenty. You need to narrow it down fast.
Bad: "What features should I include?" Good: "I'll focus on the core photo-sharing flow: upload, feed generation, and viewing. I'll leave stories, DMs, Reels, and search out of scope unless you'd like me to cover them."
See the difference? The first puts the burden on the interviewer. The second shows you understand the product and you're making a deliberate scoping decision. That's a staff-level signal.
My recommendation: list 3-5 functional requirements on the board and explicitly state what you're not building. The "not building" list matters as much as the "building" list. It shows you can manage scope, which is literally the job at senior+ levels.
What good functional requirements look like
For "Design Instagram":
- Users can upload photos with captions
- Users see a personalized feed of photos from accounts they follow
- Users can like and comment on photos
- Users can follow/unfollow other users
Out of scope: Stories, Reels, DMs, Search, Explore, Ads, Notifications
Four features. Four sentences. You've defined what you're building and what you're not. The interviewer knows exactly where this is going.
Phase 2: Non-Functional Requirements (2 minutes)
This is where most candidates are too vague. "It should be scalable and reliable" is meaningless. You need specific numbers and explicit choices.
The five NFRs that matter
| Requirement | What to state | Why it drives design |
|---|---|---|
| Scale | "500M MAU, 10M DAU" | Drives sharding, caching, CDN decisions |
| Latency | "Feed loads under 200ms p99" | Determines whether you need a cache layer |
| Availability | "99.99% for reads, 99.9% for writes" | Determines replication and multi-region strategy |
| Consistency | "Feed: eventually consistent. Follows: strongly consistent" | Determines your consistency model per feature |
| Durability | "Zero data loss for uploaded photos" | Drives storage and backup strategy |
Interview tip: anchor your numbers
Don't pull numbers from thin air. Say: "Instagram has roughly 2 billion MAU. For this design, I'll scope to 500M MAU and about 10M DAU, which gives us a 2% DAU/MAU ratio. That's conservative but keeps the math clean." Anchoring to real-world references shows you've studied the domain.
State the consistency model per feature, not globally
This is a nuance most candidates miss. "The system uses eventual consistency" is too broad. Different features have different consistency needs:
- Feed: Eventually consistent (5-10 second staleness is invisible to users)
- Like counts: Eventually consistent (slight delays are acceptable)
- Follow/unfollow: Strongly consistent (unfollowing must take effect immediately)
- Photo upload: Read-your-writes consistent (user must see their own upload right away)
Stating consistency per feature shows you understand that consistency is a spectrum, not a binary choice. That's a staff-level signal that costs you 15 extra seconds.
Phase 3: API Design (3 minutes)
This phase is where you define the contract between the client and the system. Skip it for purely internal backend questions, but for user-facing systems (which is 80%+ of interview questions), this phase is essential.
Why API design before architecture
Most candidates go straight from requirements to boxes. The problem: without defined APIs, you don't know what data flows in, what flows out, or what operations the system needs to support. You end up designing components and then backtracking to figure out the interfaces.
Defining APIs first forces you to think about the system from the consumer's perspective. What does the client actually need? What request and response shapes do they expect?
Answering these prevents over-engineering (building components for operations nobody calls) and under-engineering (missing endpoints the client needs).
How to do it in 3 minutes
You're not writing OpenAPI specs. You're sketching 3-5 key endpoints:
POST /photos โ Upload a photo (image, caption, user_id)
GET /feed/{user_id} โ Get personalized feed (cursor, page_size)
POST /likes โ Like a photo (photo_id, user_id)
POST /follows โ Follow a user (follower_id, followee_id)
DELETE /follows โ Unfollow (follower_id, followee_id)
That's it. Five lines. The interviewer can now see: (1) what the system does from the outside, (2) you thought about pagination (cursor-based, not offset), (3) you separated read and write paths, (4) you understand REST resource modeling.
When to skip or abbreviate
- "Design the backend for X": Keep API design brief, focus on service-to-service interfaces
- "How would you scale X?": Skip API design, go straight to architecture
- "Design the data model for X": Skip API design, spend more time on schema and access patterns
For your interview: if the APIs are standard CRUD, sketch them in 60 seconds and move on. If the system has interesting API decisions (WebSocket vs. polling for real-time, GraphQL vs. REST for nested data), spend the full 3 minutes because the API choice itself becomes a talking point.
For protocol choice: default to REST for user-facing APIs, GraphQL when clients need flexible nested queries, and gRPC for internal service-to-service calls. State this in one sentence and move on.
REST vs. GraphQL vs. gRPC: when to pick which
REST: simple, well-understood, cacheable. Best for standard CRUD and public APIs. GraphQL: flexible queries with nested relationships (e.g., social graphs where the client decides the depth). gRPC: strongly typed, efficient binary protocol, supports streaming. Best for internal microservice communication where latency matters.
Phase 4: Flow Design (5 minutes)
This is the phase most frameworks skip, and it's the one that prevents the worst class of design mistakes: architectures that look good on paper but don't actually work when you trace a request through them.
What flow design means
Pick 2-3 core user actions from your functional requirements and trace the complete data path:
- User does X on the client
- Client sends a request to Y
- Y does Z (reads from where? writes to where? calls what?)
- Response flows back
- User sees the result
Worked example: Instagram photo upload
User taps "Upload" on mobile app
โ Client compresses image, sends POST /photos with image + metadata
โ API Gateway authenticates, rate-limits, routes to Upload Service
โ Upload Service stores image in S3 (object storage), gets back a URL
โ Upload Service writes photo metadata (user_id, S3 URL, caption, timestamp) to DB
โ Upload Service publishes "new_photo" event to message queue
โ Feed Service consumes event, fans out post to followers' feed caches
โ Response: 201 Created with photo_id returned to client
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant US as Upload Service
participant S3 as S3 Storage
participant DB as Photo DB
participant MQ as Message Queue
participant FS as Feed Service
C->>GW: POST /photos (image + metadata)
GW->>US: Route (after auth + rate limit)
US->>S3: Store compressed image
S3-->>US: S3 URL
US->>DB: Write metadata (user_id, URL, caption)
US->>MQ: Publish "new_photo" event
US-->>C: 201 Created (photo_id)
Note over MQ,FS: Async fanout
MQ->>FS: Consume event
FS->>FS: Fan out to follower feed caches
Worked example: Instagram feed load
User opens app, triggers GET /feed/{user_id}?cursor=X
โ API Gateway routes to Feed Service
โ Feed Service checks Redis for pre-computed feed
โ Cache HIT: return cached feed items (photo_ids + metadata)
โ Cache MISS: query Feed DB for user's timeline, hydrate with photo metadata
โ For each feed item, fetch like counts from Counters Service (or cached)
โ Response: JSON array of feed items with pagination cursor
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant FS as Feed Service
participant R as Redis Cache
participant DB as Feed DB
participant CS as Counters Service
C->>GW: GET /feed/{user_id}?cursor=X
GW->>FS: Route request
FS->>R: Check pre-computed feed
alt Cache HIT
R-->>FS: Feed items
else Cache MISS
R-->>FS: Miss
FS->>DB: Query timeline
DB-->>FS: Raw feed items
end
FS->>CS: Fetch like counts (batched)
CS-->>FS: Counts
FS-->>C: JSON feed + pagination cursor
Those two flows, written on the board in 4-5 minutes, tell the interviewer everything: which services exist, what data store each one uses, which paths are synchronous vs. async, and where the caching layer sits.
For your interview: always trace the write path and the read path separately. They're almost always different architectures. Making this separation explicit is a strong signal.
Phase 5: High-Level Architecture (17 minutes)
This is the main event. You're building the system. But because you've done the first four phases, you already know the requirements, the APIs, and how data flows. Drawing becomes assembly, not invention.
The drawing sequence
Start with the user and draw left to right (or top to bottom):
- Draw the client (mobile app, web browser, API consumer)
- Draw the infrastructure layer (load balancer, API gateway, CDN)
- Draw the core services (the 2-3 services that handle your functional requirements)
- Draw the data stores (which database? cache? object storage?)
- Draw the async paths (message queues, event buses for anything non-synchronous)
- Label every arrow with what flows through it
Here's what a Phase 5 diagram might look like for our Instagram example:
flowchart TD
subgraph Clients["๐ค Clients"]
Mobile(["๐ฑ Mobile App"])
Web(["๐ Web App"])
end
subgraph Infra["๐ Infrastructure"]
CDN["โก CDN\nStatic assets + images"]
LB["๐ Load Balancer"]
GW["๐ API Gateway\nAuth ยท Rate limit"]
end
subgraph Services["โ๏ธ App Services"]
US["โ๏ธ Upload Service"]
FS["โ๏ธ Feed Service"]
UserS["โ๏ธ User Service"]
end
subgraph Async["๐จ Async"]
MQ["๐จ Kafka\nFeed fanout events"]
end
subgraph Data["๐๏ธ Data Stores"]
S3[("๐ฆ S3\nPhoto blobs")]
PG[("๐ข PostgreSQL\nUsers ยท Follows")]
Cass[("๐๏ธ Cassandra\nFeed timeline")]
Redis["โก Redis\nFeed cache ยท Sessions"]
end
Mobile -->|"HTTPS"| CDN
Web -->|"HTTPS"| CDN
Mobile -->|"API calls"| LB
Web -->|"API calls"| LB
LB --> GW
GW -->|"Upload"| US
GW -->|"Feed read"| FS
GW -->|"Profile"| UserS
US -->|"Store image"| S3
US -->|"Write metadata"| PG
US -->|"Publish event"| MQ
MQ -->|"Consume"| FS
FS -->|"Cache read/write"| Redis
FS -.->|"Cache miss"| Cass
UserS -->|"Query"| PG
Narrate while drawing
Don't draw silently. Every box gets a one-sentence justification:
- "I'm putting a load balancer here because we'll have multiple API servers to distribute traffic across."
- "PostgreSQL for user profiles because we have relational queries and need ACID. Feed data in Cassandra because it's write-heavy and we can tolerate eventual consistency."
- "Writes go through Kafka before hitting the feed cache because feed fanout at 50K writes/sec would overwhelm a synchronous write path."
Every component you add should be justified. If you can't explain why it's there in one sentence, you probably don't need it.
The overdesign trap
The most common mistake at the senior-to-staff transition: adding every component you know. Service mesh, circuit breakers, CQRS, event sourcing, Kafka, Redis, Cassandra, a CDN, a WAF, rate limiting... If your diagram looks like a cloud vendor's product catalog, you've lost the plot. Start simple. Add complexity only when a requirement forces it. The interviewer's follow-up question is your invitation to go deeper.
Justify with numbers, not instinct
When you add a component, connect it to your NFRs and quick estimates (see Capacity Planning for how to translate estimates into infrastructure decisions):
| Component | When to add | One-sentence justification |
|---|---|---|
| Load Balancer | >1 app server (practically always) | "Distributes traffic and handles failover" |
| API Gateway | Multiple services need auth, rate limiting | "Centralizes cross-cutting concerns" |
| Cache (Redis) | Read-heavy workload, same data read repeatedly | "Absorbs read traffic, sub-ms latency" |
| Message Queue | Async processing, decoupled services | "Decouples write path from processing" |
| CDN | Static content, global users | "Serves static assets from edge, cuts latency" |
| Object Storage | Images, videos, files | "Cheap, durable, infinite scale for blobs" |
| Search Index | Full-text search, complex queries | "Elasticsearch for queries your DB can't handle" |
Phase 6: Deep Dives (15 minutes)
You've drawn the architecture. The last 15 minutes are where you prove you can actually build the system, not just draw it.
What belongs in deep dives
This is where DB design, schema choices, and scaling strategy live. The architecture gave you the "what." Deep dives give you the "how."
| Deep dive topic | What to cover |
|---|---|
| Database design | Table schemas, primary keys, indexes, access patterns, denormalization decisions |
| Scaling challenges | What breaks at 10x? Sharding strategy, partition keys, rebalancing approach |
| Caching strategy | What to cache, invalidation approach, TTL decisions, thundering herd mitigation |
| Consistency model | How you ensure the right consistency for each feature (strong vs. eventual) |
| Failure modes | What happens when component X goes down? Degradation strategy, recovery path |
How to pick what to dive into
Two strategies:
- Let the interviewer lead: "Which component would you like me to go deeper on?" Collaborative and safe.
- Proactively dive into the hardest part: "The most interesting engineering challenge here is the feed generation pipeline. Let me walk through the database design and scaling approach." This is the staff-level move.
My recommendation: offer the choice first. If the interviewer says "you pick," choose the component with the most interesting trade-offs.
DB design in deep dives: what good looks like
When you deep-dive on a database design, cover these four things:
1. Schema (30 seconds):
CREATE TABLE photos (
photo_id BIGINT PRIMARY KEY, -- Snowflake ID for sortability
user_id BIGINT NOT NULL,
s3_url TEXT NOT NULL,
caption TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- PostgreSQL: create index separately
CREATE INDEX idx_user_created ON photos (user_id, created_at DESC);
2. Access patterns (30 seconds): "The primary read pattern is: get all photos by user_id, ordered by created_at descending. That's a range scan on the composite index. Secondary: get photo by photo_id (primary key lookup, O(1))."
3. Why this schema (30 seconds): "I chose a composite index on (user_id, created_at) because that's our hot query path. Putting created_at in descending order means the most recent photos are physically co-located on disk, which makes the range scan fast."
4. What changes at scale (30 seconds): "At 1B photos, this table is ~5TB. We'd partition by user_id range or hash. The index stays effective because all queries include user_id."
That's a complete DB deep dive in 2 minutes. Specific, justified, and scale-aware.
The depth expected at each level
| Level | What they want to see |
|---|---|
| Mid-level | List components and explain what each does |
| Senior | Trace data flow, identify bottlenecks, propose solutions |
| Staff | Make and justify trade-off decisions, design DB schemas, identify failure modes |
| Principal | Build vs. buy decisions, system-level cost optimization, multi-year evolution path |
Interview tip: the 'what if' technique
Proactively ask yourself "what if" questions out loud during deep dives. "What if this node goes down?" "What if traffic spikes 10x?" "What if we need to add a new feature?" Each "what if" demonstrates that you think about systems the way operators do.
Adapting the Framework
For 30-minute interviews
Compress Phases 1-4 into 5 minutes total. State requirements and APIs as assertions rather than discussions. Spend 12 minutes on architecture, 13 on deep dives.
For 60-minute interviews
Use the extra time for a second deep dive and more detailed API/flow design. Add a "Bottlenecks & Evolution" section: "Here's what breaks first as we grow from 10M to 100M users, and here's the migration path."
For different question types
| Question type | Framework adjustment |
|---|---|
| "Design X" (Twitter, Uber) | Full 6-phase framework |
| "Design the backend for X" | Abbreviate API design, go deep on services and DB |
| "How would you scale X?" | Skip Phases 1-4, go straight to architecture evolution |
| "Design the data model for X" | Phases 1-2, then DB design as main event |
How This Shows Up in Interviews
For the 15 most common ways candidates fail this framework, see Common Pitfalls.
The signals interviewers look for
At senior+ level, interviewers evaluate three meta-skills:
- Ownership of the conversation: Do you drive, or do you wait to be told what to do?
- Trade-off articulation: Can you name the alternatives and explain why you chose this one?
- Scope management: Can you decide what's important and what to skip?
The framework gives you all three. You drive by announcing the phases. You articulate trade-offs at every decision point. You manage scope in Phase 1 by explicitly setting boundaries.
Common interviewer follow-ups
| Interviewer asks | Strong answer |
|---|---|
| "Why did you choose that database?" | "PostgreSQL for user profiles: relational joins for the social graph, ACID for follow/unfollow. Cassandra for feed data: optimized for write throughput, and our feed reads are simple key lookups." |
| "What happens if this service goes down?" | "The system degrades gracefully. Users see a cached feed (up to 5 min stale). Writes queue in Kafka and replay on recovery. We page on-call if p99 latency exceeds 500ms." |
| "How would you handle 10x traffic?" | "The stateless app tier scales horizontally behind the LB. Cache absorbs read amplification. For writes, we add Kafka partitions. The DB is the last bottleneck: we'd shard by user_id when single-primary writes exceed capacity." |
| "What would you do differently with more time?" | "I'd add Elasticsearch for content search, implement rate limiting at the API gateway, design the notification pipeline, and add distributed tracing for observability." |
Interview tip: the opening sentence that changes the dynamic
Open with: "Here's how I'd like to structure our time. I'll spend a few minutes on requirements and APIs, trace the key flows, then build the architecture, and we'll deep-dive into the hardest parts. Sound good?" This transforms you from "candidate being evaluated" to "engineer leading a design session."
Quick Recap
- The 6-phase framework (Functional Requirements โ Non-Functional Requirements โ API Design โ Flow Design โ High-Level Architecture โ Deep Dives) gives you a structured path through any system design interview.
- Phases 1-4 together form a complete specification. Spending 13 minutes here prevents the "redraw at minute 20" disaster that sinks most interviews.
- API design forces you to think from the consumer's perspective, catching missing requirements like idempotency, pagination, and real-time vs. polling.
- Flow design catches async gaps, race conditions, and write/read path mismatches before you commit to architecture.
- DB design, schema choices, and indexing strategies belong in Deep Dives, where you prove you can build what you drew.
- The single highest-signal behavior: announcing your structure at the start, checking in after each phase, and adapting when the interviewer steers.
Related Concepts
- Estimation: The skill of rapid back-of-envelope math. Use estimation inside Phases 2 and 5 to justify component choices with numbers instead of instinct.
- Capacity Planning: Translating estimates into concrete infrastructure decisions. Essential for Phase 5 when deciding how many servers, what cache size, and when to shard.
- Common Pitfalls: A pre-flight checklist of the 15 mistakes that fail interviews. Review this the night before every interview to internalize the anti-patterns.