The 15 mistakes that fail system design interviews

TL;DR

The most common interview failure isn't wrong architecture. It's wrong process: jumping to components before requirements, designing silently, or spending 15 minutes on estimation math.
Over-engineering kills more interviews than under-engineering. Adding Kafka, CQRS, and event sourcing to a system with 100 req/sec is a red flag, not a strength.
The interviewer is evaluating how you think, not what you know. A candidate who says "I don't know but here's how I'd reason about it" passes more often than one who gives a memorized answer with no reasoning.
Every pitfall in this guide has a concrete fix. Read the "bad" example, understand why it's bad, then internalize the "good" version.
This article is your pre-flight checklist. Review it the night before every interview.

Why Good Engineers Fail Interviews

Here's an uncomfortable truth: the engineers who fail system design interviews are often excellent at their jobs. They've built real systems, operated them at scale, and solved hard problems. But the interview is a different environment with different rules.

At work, you have context. Your team knows the codebase, the constraints, the history. In an interview, you have 45 minutes with a stranger who knows nothing about your mental model. The skills that make you effective at work (intuition, tribal knowledge, working code) don't transfer to an interview where you must externalize your reasoning from scratch.

I've debriefed hundreds of interviews at top tech companies. The same 15 mistakes appear in over 80% of failed interviews. None of them are about lacking knowledge. All of them are about presentation, process, or judgment.

The good news: every single one is fixable with awareness and practice.

Process Mistakes (40% of failures)

These are the most painful because they're the easiest to fix. You don't need to learn new technology. You need to change how you structure your 45 minutes.

Pitfall 1: Jumping to components before requirements

What it looks like:

"Design Twitter? OK, so we'll need a load balancer, then some app servers, a Redis cache for the timeline, Cassandra for tweets, Kafka for the feed pipeline, a CDN for images..."

The candidate is drawing boxes before asking a single question. They don't know if this is Twitter for 1M users or 1B users. They don't know if the focus is feed generation, search, or DMs. They're designing from muscle memory, not from requirements.

Why it fails:

The interviewer's first thought is: "This candidate will build what they know, not what the problem requires." That's a dangerous trait on a real team. When the interview question has an uncommon twist (offline-first, real-time collaboration, geo-distributed), this candidate will miss it entirely because they never asked.

The fix:

Spend 3-5 minutes on requirements. Propose a scope: "I'll focus on the core feed: posting tweets, generating timelines, and reading feeds. I'll leave DMs, search, and trending topics for later unless you'd prefer I cover those." Then state non-functional requirements with numbers: "I'll design for 10M DAU, 200ms p99 feed latency, and eventual consistency on the timeline."

My recommendation: write the requirements on the board before drawing a single box. The act of writing them down forces clarity and gives the interviewer a chance to steer.

Pitfall 2: The 15-minute estimation spiral

What it looks like:

The candidate spends 12 minutes computing storage for tweets, retweets, likes, user profiles, direct messages, media, analytics events, and audit logs. They debate whether a tweet is 280 bytes or 300 bytes. They compute storage for both and present the difference.

Why it fails:

Estimation is not the interview. It's a 5-minute tool to inform the design. The candidate just burned a third of the interview on arithmetic that didn't change any design decision. The storage difference between "10 TB" and "12 TB" is zero architectural choices.

The fix:

Cap estimation at 5 minutes. Only estimate numbers that drive decisions. Ask yourself: "If this number were 2x higher, would I design differently?" If no, skip it. See Estimation for the 3-step formula.

The precision trap

"We need 4.217 TB of storage over 5 years." Nobody has ever said this in a production capacity planning meeting. The answer is "roughly 5 TB." False precision in estimation signals that you don't understand what estimation is for: enabling decisions, not winning math competitions.

Pitfall 3: No structure announcement

What it looks like:

The candidate starts talking. Sometimes about requirements, sometimes about components, sometimes about scaling. The interviewer can't predict what's coming next. There's no visible plan.

Why it fails:

The interviewer makes a decision about your communication ability in the first 2 minutes. Starting without announcing a structure tells them: "This candidate will ramble in design reviews, and I'll spend meetings trying to figure out where they're going."

The fix:

One sentence at the start: "I'll spend 5 minutes on requirements and estimates, 20 minutes on the high-level architecture, and 15 minutes diving deep into the most interesting components. I'll check in with you as we go." Done. See Approach & Structure.

Pitfall 4: Treating the interview as a monologue

What it looks like:

The candidate talks for 20 minutes straight without pausing, checking in, or asking the interviewer a question. The interviewer tries to interject ("What about...") and the candidate says "I'll get to that" and keeps going.

Why it fails:

System design interviews are collaborative. The interviewer has hints they want to give you. They have areas they want to explore. When you monologue, you prevent them from guiding you toward the aspects they care about. Worse, you miss their signals that something is wrong with your approach.

The fix:

Check in after every phase. After requirements: "Does this scope look right, or would you like me to adjust?" After the high-level design: "Before I go deeper, any component you'd like me to prioritize?" During deep dives: "I went into detail on X. Should I continue, or would you rather explore Y?"

These check-ins cost 10 seconds each and save you from spending 15 minutes on something the interviewer doesn't care about.

Design Mistakes (35% of failures)

These require some technical judgment to fix, but the fix is almost always "do less."

Pitfall 5: Over-engineering (the resume-driven design)

What it looks like:

The design includes: API Gateway, service mesh, 8 microservices, CQRS, event sourcing, Kafka, Redis, Cassandra, PostgreSQL, Elasticsearch, a CDN, and a custom ML recommendation engine. For a URL shortener.

Why it fails:

Every component adds operational cost: monitoring, debugging, deployment pipelines, failure modes. A URL shortener with 1K req/sec needs one API server, one database, and maybe a cache. Adding Kafka to this is like buying a semi-truck to commute to work.

The interviewer's internal question: "Would I trust this person to make build vs. buy decisions? Or would they over-complicate every project?"

The fix:

Start with the simplest architecture that meets requirements. Add complexity only when a specific number demands it. "I'm adding a cache because our 50K reads/sec exceeds the database's 10K capacity" is a justified addition. "I'm adding a cache because caches are good" is not.

The rule: every component must be justified by a requirement or an estimate. If you can't justify it in one sentence, remove it.

Pitfall 6: Under-designing (the hand-wavy architecture)

What it looks like:

"So users connect to some servers, which talk to a database. We'll cache some stuff. If it gets really big we'll shard."

No specific databases named. No data model discussed. No data flow traced. No numbers anywhere. The design is a collection of generic boxes that could be any system for any purpose.

Why it fails:

Generic designs show zero engineering decision-making. The interviewer can't tell if you understand why you'd choose PostgreSQL vs. Cassandra, When to shard vs. when to cache, or how data actually flows through the system.

The fix:

Name everything. "PostgreSQL for the order data because we need ACID transactions." "Redis for session storage because sessions are ephemeral and need sub-ms reads." "Kafka for the event pipeline because we need replay capability and at-least-once delivery." Specificity is the signal.

Pitfall 7: Ignoring non-functional requirements

What it looks like:

The candidate builds a system that handles the functional requirements (users can post, users can read feeds) but never mentions latency, availability, consistency, or scale. The design has no numbers attached to it.

Why it fails:

Two designs can look identical on a whiteboard and be completely different systems depending on non-functional requirements. A feed that tolerates 5-second staleness uses a different caching strategy than one requiring real-time updates. A system with 99.9% SLA has different redundancy than one with 99.99%.

The fix:

State non-functional requirements explicitly in Phase 2 (NFRs) and reference them during design. When adding a component, link it to an NFR: "I'm adding Redis to meet the 200ms p99 latency requirement on feed reads. Without caching, DB round-trip latency would be 50-100ms per query, and we make 3 queries per feed load."

Pitfall 8: Wrong scaling strategy for the access pattern

What it looks like:

"Our system is read-heavy (100:1 read:write ratio), so I'll shard the database for write scalability." Or: "We have 50K writes/sec, so I'll add a cache."

Why it fails:

Caching helps reads, not writes. Sharding helps writes (and reads from the partition key), not arbitrary reads. Read replicas help reads, not writes. Each scaling technique has a specific purpose. Applying the wrong one wastes interview time and shows a gap in understanding.

The fix:

Match the scaling technique to the bottleneck:

Bottleneck	Correct technique
Too many reads	Cache, read replicas, CDN
Too many writes	Sharding, write-optimized DB, async with queues
Too much storage	Object storage, archival, data lifecycle policies
Too high latency	Cache, CDN, edge computing, multi-region
Too low availability	Replicas, multi-AZ, multi-region, circuit breakers

Pitfall 9: No failure mode discussion

What it looks like:

The entire design assumes everything works. No mention of what happens when Redis goes down, the database fails over, the message queue backs up, or a service throws errors.

Why it fails:

In production, every component fails. The mark of an experienced engineer is designing with failure in mind. An entire interview without mentioning failure modes tells the interviewer: "This person hasn't operated systems in production, or if they have, they don't think about resilience proactively."

The fix:

For each critical component, state the failure mode and your mitigation in one sentence:

"If Redis goes down, we fall back to the database with degraded latency. I'd add a local in-process cache as L1 to survive cache outages."
"If the write service goes down, events queue in Kafka and replay on recovery. No data loss."
"If a third-party payment API has latency spikes, we have a 3-second timeout with circuit breaker. After 5 timeouts in 30 seconds, we fail open and queue payments for retry."

You don't need to cover every failure. Cover the most likely failure (cache down) and the most damaging failure (database down). Two sentences each.

Communication Mistakes (25% of failures)

You can know everything and still fail if you can't communicate it.

Pitfall 10: Silent drawing

What it looks like:

The candidate draws boxes and arrows for 5 minutes without saying a word. The interviewer watches a diagram materialize with no context for why any component is there.

Why it fails:

The interviewer can't evaluate your thinking if they can't hear it. A box labelled "Redis" tells them nothing. Why Redis and not Memcached? Why a cache at all? What data is cached? What's the TTL? What's the hit rate expectation?

The fix:

Narrate while drawing. Every box gets a one-sentence justification spoken out loud: "I'm adding Redis here as our cache layer. It'll store the hot product catalog data with a 5-minute TTL. At 95% hit rate, this absorbs 95% of our 50K reads/sec and keeps the DB load at a manageable 2.5K reads/sec."

Pitfall 11: Not labeling arrows

What it looks like:

The diagram has boxes connected by arrows, but no arrow has a label. It's unclear what data flows between components, what protocol is used, or whether the connection is synchronous or asynchronous.

Why it fails:

Unlabeled arrows mean the interviewer must ask about every connection: "What does this arrow represent?" That's a waste of both your time. It also suggests you don't think about data flow, you just draw boxes.

The fix:

Every arrow gets a label: the protocol (HTTP, gRPC, TCP), the data (user profile, feed items, events), and the pattern (sync, async, streaming). Example: "REST: POST /tweet" or "Kafka: feed-events topic" or "gRPC: getUser()".

Pitfall 12: Jargon without explanation

What it looks like:

"We'll use consistent hashing with virtual nodes for the sharding strategy." No explanation of what consistent hashing is, why it's needed here, or what virtual nodes solve.

Why it fails:

Two possible interpretations: (1) you understand it deeply and forgot the interviewer needs context, or (2) you memorized the term without understanding it. The interviewer will probe to find out which, and if it's (2), trust evaporates.

The fix:

First use of any term: one-sentence definition. "Consistent hashing distributes data across nodes so that adding or removing a node only reshuffles about 1/N of the data instead of all of it. That's important here because our system will need to add shards as traffic grows, and full reshuffles cause downtime."

Pitfall 13: No trade-off articulation

What it looks like:

"I'll use Cassandra for the feed data." No mention of what alternatives were considered, why Cassandra over PostgreSQL, or what you give up by choosing it.

Why it fails:

Every design choice has trade-offs. Cassandra gives you write throughput and horizontal scalability but gives up strong consistency and complex queries. If you don't articulate this, the interviewer doesn't know if you considered the alternatives or just picked the first thing that came to mind.

The fix:

For every major design decision: "I chose X over Y because Z. The trade-off is we lose A, but that's acceptable because B."

Example: "I'm using Cassandra instead of PostgreSQL for the feed store. Cassandra optimizes for write throughput (we need 50K writes/sec for feed fanout), and our feed reads are simple key lookups by user_id, which Cassandra handles well. The trade-off: we lose ad-hoc SQL queries and ACID transactions on feed data, but feeds don't need either. For the user profile data, I'll keep PostgreSQL because we do need relational queries and ACID there."

Interview tip: the trade-off formula

Memorize this sentence template: "I chose [X] over [Y] because [Z]. We give up [A], which is acceptable because [B]." Use it for every major decision. It takes 15 seconds and transforms a hand-wavy choice into an engineering decision.

Pitfall 14: Not using the whiteboard effectively

What it looks like:

Tiny, cramped diagrams in one corner. Illegible handwriting. Components drawn in random order so the data flow isn't visually clear. Or in virtual interviews: not using the shared drawing tool at all, just describing the architecture verbally.

Why it fails:

The diagram is the artifact the interviewer keeps in their notes. If it's illegible, they can't write a clear assessment. If there's no diagram, they have to reconstruct your design from memory during the debrief. Neither is good for you.

The fix:

Use the full board. Draw left-to-right or top-to-bottom, following the data flow direction. Leave space between components for labels. Use the top for requirements and estimates, the center for the architecture, and the margins for notes and trade-off decisions. For virtual interviews: use the drawing tool even if you're slow at it. A messy diagram is better than no diagram.

Pitfall 15: Not knowing when to stop

What it looks like:

The candidate keeps adding details, components, and edge cases long after making their point. The deep dive on the feed service includes threading models, JVM garbage collection tuning, and Kubernetes pod resource limits.

Why it fails:

Going too deep into implementation details signals you can't distinguish architecture from implementation. System design interviews care about which components and why, not about container orchestration settings.

The fix:

For each deep dive, stop at the architectural level. "The feed service is a stateless Go service that reads from Redis and falls back to Cassandra. It pre-fetches the next page of results to minimize latency. I'd run 10 instances behind a load balancer." That's sufficient. If the interviewer wants to go deeper, they'll ask. Don't pre-empt their questions by diving into implementation yourself.

The Pre-Interview Checklist

Review this 10 minutes before every interview:

#	Check	One-line reminder
1	Announce structure at start	"Requirements, APIs, flows, architecture, deep dives"
2	Propose scope, don't ask for it	"I'll focus on X, Y, Z. Leaving A, B out of scope."
3	Cap estimation at 5 minutes	Only compute numbers that drive decisions
4	Justify every component	"I added X because [estimate] requires it"
5	Label every arrow	Protocol + data + sync/async
6	State trade-offs at each decision	"Chose X over Y because Z. We lose A."
7	Check in after each phase	"Does this look right? What should we dive into?"
8	Cover failure modes for critical components	"If X goes down, we degrade to Y"
9	Start simple, add complexity	Don't open with microservices for an MVP
10	Let the interviewer steer deep dives	"Which area interests you most?"

Interview tip: the mental model shift

You're not a student being tested. You're a staff engineer leading a design discussion with a colleague. The interviewer is your teammate, not your evaluator (even though they literally are your evaluator). When you shift into "tech lead running a meeting" mode, the right behaviors (structuring, checking in, justifying, collaborating) happen naturally.

How This Shows Up in Interviews

The irony of this article: you won't explicitly "apply" these lessons. If you've internalized them, they'll manifest as absence of mistakes rather than presence of techniques. The interviewer won't think "great structure." They'll think "this candidate is really clear and organized." They won't think "good trade-off articulation." They'll think "this person really understands the engineering decisions."

That's the goal. The framework disappears into fluency.

The post-interview self-assessment

After every practice session or real interview, run through the 15 pitfalls and honestly ask: "Did I do any of these?" If yes, practice specifically against that pitfall next time. Most candidates have 2-3 bad habits that appear in every session. Identifying yours and fixing them is the highest-leverage preparation you can do.

Quick Recap

The top interview failure isn't wrong architecture; it's wrong process. Announce your structure, cap estimation at 5 minutes, and check in with the interviewer after each phase.
Over-engineering is worse than under-engineering. Start simple, add complexity only when numbers demand it.
Every component needs a one-sentence justification tied to a requirement or estimate.
Every design decision needs a trade-off statement: "Chose X over Y because Z. We lose A."
Cover failure modes for critical components. Two sentences per component: what breaks and what happens.
Narrate while designing. Silent drawing prevents the interviewer from evaluating your thinking.
The framework section is complete: Approach, Estimation, Capacity Planning, and this Pitfalls guide give you everything you need for HLD interviews. Practice until the framework disappears into fluency.

Approach & Structure - The 6-phase framework that prevents process mistakes. If you follow this structure, Pitfalls 1-4 are eliminated.
Estimation - The technique that prevents Pitfall 2 (estimation spiral) and Pitfall 5 (over-engineering). Numbers are the filter for justified complexity.
Capacity Planning - Translates estimates into decisions, preventing Pitfall 8 (wrong scaling strategy) and Pitfall 6 (under-designing).
Scalability - Understanding the scaling ladder ensures you pick the right technique (Pitfall 8).
Microservices - When to use microservices vs. a monolith. The answer is "later than you think" for most systems.

TL;DR

The most common interview failure isn't wrong architecture. It's wrong process: jumping to components before requirements, designing silently, or spending 15 minutes on estimation math.
Over-engineering kills more interviews than under-engineering. Adding Kafka, CQRS, and event sourcing to a system with 100 req/sec is a red flag, not a strength.
The interviewer is evaluating how you think, not what you know. A candidate who says "I don't know but here's how I'd reason about it" passes more often than one who gives a memorized answer with no reasoning.
Every pitfall in this guide has a concrete fix. Read the "bad" example, understand why it's bad, then internalize the "good" version.
This article is your pre-flight checklist. Review it the night before every interview.

Why Good Engineers Fail Interviews

The good news: every single one is fixable with awareness and practice.

Process Mistakes (40% of failures)

These are the most painful because they're the easiest to fix. You don't need to learn new technology. You need to change how you structure your 45 minutes.

Pitfall 1: Jumping to components before requirements

What it looks like:

"Design Twitter? OK, so we'll need a load balancer, then some app servers, a Redis cache for the timeline, Cassandra for tweets, Kafka for the feed pipeline, a CDN for images..."

Why it fails:

The fix:

My recommendation: write the requirements on the board before drawing a single box. The act of writing them down forces clarity and gives the interviewer a chance to steer.

Pitfall 2: The 15-minute estimation spiral

What it looks like:

Why it fails:

The fix:

The precision trap

Pitfall 3: No structure announcement

What it looks like:

The candidate starts talking. Sometimes about requirements, sometimes about components, sometimes about scaling. The interviewer can't predict what's coming next. There's no visible plan.

Why it fails:

The fix:

Pitfall 4: Treating the interview as a monologue

What it looks like:

Why it fails:

The fix:

These check-ins cost 10 seconds each and save you from spending 15 minutes on something the interviewer doesn't care about.

Design Mistakes (35% of failures)

These require some technical judgment to fix, but the fix is almost always "do less."

Pitfall 5: Over-engineering (the resume-driven design)

What it looks like:

Why it fails:

The interviewer's internal question: "Would I trust this person to make build vs. buy decisions? Or would they over-complicate every project?"

The fix:

The rule: every component must be justified by a requirement or an estimate. If you can't justify it in one sentence, remove it.

Pitfall 6: Under-designing (the hand-wavy architecture)

What it looks like:

"So users connect to some servers, which talk to a database. We'll cache some stuff. If it gets really big we'll shard."

No specific databases named. No data model discussed. No data flow traced. No numbers anywhere. The design is a collection of generic boxes that could be any system for any purpose.

Why it fails:

The fix:

Pitfall 7: Ignoring non-functional requirements

What it looks like:

Why it fails:

The fix:

Pitfall 8: Wrong scaling strategy for the access pattern

What it looks like:

"Our system is read-heavy (100:1 read:write ratio), so I'll shard the database for write scalability." Or: "We have 50K writes/sec, so I'll add a cache."

Why it fails:

The fix:

Match the scaling technique to the bottleneck:

Bottleneck	Correct technique
Too many reads	Cache, read replicas, CDN
Too many writes	Sharding, write-optimized DB, async with queues
Too much storage	Object storage, archival, data lifecycle policies
Too high latency	Cache, CDN, edge computing, multi-region
Too low availability	Replicas, multi-AZ, multi-region, circuit breakers

Pitfall 9: No failure mode discussion

What it looks like:

The entire design assumes everything works. No mention of what happens when Redis goes down, the database fails over, the message queue backs up, or a service throws errors.

Why it fails:

The fix:

For each critical component, state the failure mode and your mitigation in one sentence:

"If Redis goes down, we fall back to the database with degraded latency. I'd add a local in-process cache as L1 to survive cache outages."
"If the write service goes down, events queue in Kafka and replay on recovery. No data loss."
"If a third-party payment API has latency spikes, we have a 3-second timeout with circuit breaker. After 5 timeouts in 30 seconds, we fail open and queue payments for retry."

You don't need to cover every failure. Cover the most likely failure (cache down) and the most damaging failure (database down). Two sentences each.

Communication Mistakes (25% of failures)

You can know everything and still fail if you can't communicate it.

Pitfall 10: Silent drawing

What it looks like:

The candidate draws boxes and arrows for 5 minutes without saying a word. The interviewer watches a diagram materialize with no context for why any component is there.

Why it fails:

The fix:

Pitfall 11: Not labeling arrows

What it looks like:

The diagram has boxes connected by arrows, but no arrow has a label. It's unclear what data flows between components, what protocol is used, or whether the connection is synchronous or asynchronous.

Why it fails:

The fix:

Pitfall 12: Jargon without explanation

What it looks like:

"We'll use consistent hashing with virtual nodes for the sharding strategy." No explanation of what consistent hashing is, why it's needed here, or what virtual nodes solve.

Why it fails:

The fix:

Pitfall 13: No trade-off articulation

What it looks like:

"I'll use Cassandra for the feed data." No mention of what alternatives were considered, why Cassandra over PostgreSQL, or what you give up by choosing it.

Why it fails:

The fix:

For every major design decision: "I chose X over Y because Z. The trade-off is we lose A, but that's acceptable because B."

Interview tip: the trade-off formula

Pitfall 14: Not using the whiteboard effectively

What it looks like:

Why it fails:

The fix:

Pitfall 15: Not knowing when to stop

What it looks like:

Why it fails:

The fix:

The Pre-Interview Checklist

Review this 10 minutes before every interview:

#	Check	One-line reminder
1	Announce structure at start	"Requirements, APIs, flows, architecture, deep dives"
2	Propose scope, don't ask for it	"I'll focus on X, Y, Z. Leaving A, B out of scope."
3	Cap estimation at 5 minutes	Only compute numbers that drive decisions
4	Justify every component	"I added X because [estimate] requires it"
5	Label every arrow	Protocol + data + sync/async
6	State trade-offs at each decision	"Chose X over Y because Z. We lose A."
7	Check in after each phase	"Does this look right? What should we dive into?"
8	Cover failure modes for critical components	"If X goes down, we degrade to Y"
9	Start simple, add complexity	Don't open with microservices for an MVP
10	Let the interviewer steer deep dives	"Which area interests you most?"

Interview tip: the mental model shift

The top interview failure isn't wrong architecture; it's wrong process. Announce your structure, cap estimation at 5 minutes, and check in with the interviewer after each phase.
Over-engineering is worse than under-engineering. Start simple, add complexity only when numbers demand it.
Every component needs a one-sentence justification tied to a requirement or estimate.
Every design decision needs a trade-off statement: "Chose X over Y because Z. We lose A."
Cover failure modes for critical components. Two sentences per component: what breaks and what happens.
Narrate while designing. Silent drawing prevents the interviewer from evaluating your thinking.
The framework section is complete: Approach, Estimation, Capacity Planning, and this Pitfalls guide give you everything you need for HLD interviews. Practice until the framework disappears into fluency.

Approach & Structure - The 6-phase framework that prevents process mistakes. If you follow this structure, Pitfalls 1-4 are eliminated.
Estimation - The technique that prevents Pitfall 2 (estimation spiral) and Pitfall 5 (over-engineering). Numbers are the filter for justified complexity.
Capacity Planning - Translates estimates into decisions, preventing Pitfall 8 (wrong scaling strategy) and Pitfall 6 (under-designing).
Scalability - Understanding the scaling ladder ensures you pick the right technique (Pitfall 8).
Microservices - When to use microservices vs. a monolith. The answer is "later than you think" for most systems.

Comments

Comments