Non-functional requirements (NFRs) drive architecture more than features do. The same feature ("user views their feed") produces completely different systems depending on the latency, consistency, and availability targets.
The 6 core NFRs to clarify in every interview: Availability, Latency, Throughput, Consistency, Durability, and Scalability. Nail these six and you've covered 90% of architecture-driving constraints.
Every NFR must be quantified. "High availability" is useless. "99.99% availability" tells you exactly what infrastructure you need (multi-AZ, no single points of failure).
NFRs trade off against each other. You cannot have 99.999% availability AND strong consistency AND sub-10ms latency. The magic of senior-level design is choosing which NFRs to optimize and which to relax.
Use the NFR sentence template to anchor your design: "We need X availability with Y latency for Z throughput, accepting [consistency model] for [data type]."
Consider this functional requirement: "Users can view a product's current price."
For an internal analytics dashboard with 50 users, this is a SQL query against a single PostgreSQL instance. Response time of 500ms is fine. If the database is down for 10 minutes during a deploy, nobody panics.
For Amazon's product page with 300 million monthly visitors, this is a globally replicated cache layer backed by a distributed database, served from CDN edge locations, with sub-50ms P99 latency, 99.99% availability, and the ability to handle 500,000 reads per second during Prime Day.
Same feature. Same functional requirement. Completely different systems. The difference? Non-functional requirements.
I've watched candidates design an entire architecture without ever asking about latency, availability, or consistency targets. The architecture looks reasonable in isolation, but the interviewer can't evaluate it because they don't know what constraints the candidate was designing for. Was that Redis cache there for latency reasons? For throughput? The candidate never said.
NFRs are the bridge between "what the system does" and "how the system is built." Without them, your architecture decisions are arbitrary.
Comments
The 'reasonable defaults' trap
Don't assume the interviewer has the same defaults you do. "Reasonable latency" might mean 200ms to you and 50ms to them. "High availability" might mean 99.9% to you and 99.99% to them. That 0.09% difference is the difference between multi-AZ and multi-region. Always quantify.
These are the six non-functional requirements that appear in nearly every system design interview. For each one, I'll cover what it means, concrete thresholds you should know, the architecture changes it forces, and how to phrase the question in an interview.
99.99% availability means every component must be redundant. No single database. No single load balancer. No single cache node. Every failure domain needs an independent backup. This is where you introduce multi-AZ deployments, read replicas, Sentinel/Cluster Redis, and health-check-driven routing.
99.999% pushes you to multi-region active-active architectures, which introduces consistency challenges (data written in US-East takes time to reach EU-West).
How to ask in an interview:
"What availability target should I design for? Is this a 99.9% SLA (internal tool level) or 99.99% (user-facing product level)?"
I'll often see candidates say "high availability" without a number. That's a missed opportunity. Saying "99.99% availability" and then designing multi-AZ failover to match is what separates a senior answer from a mid-level one.
Sub-50ms P99 means you can't afford a cache miss hitting a slow database. You need a cache-first read path with high hit rates (95%+). It also means no cross-region round trips (those add 50-150ms alone).
Sub-200ms is the sweet spot for most user-facing APIs. You get a budget of 2-3 sequential I/O operations. This is where careful service decomposition matters: every additional service hop eats into your latency budget.
How to ask in an interview:
"What's the acceptable response time for this operation? Should I target sub-100ms (aggressive, cache-first) or is 200-500ms acceptable?"
Interview tip: always say P99, never say average
Averages hide tail latency. A system with 50ms average and 5-second P99 has a terrible user experience for 1% of requests. Mentioning P99 (or P95) shows you understand real-world performance characteristics. If you want to show even more depth, mention that P99.9 matters for high-fan-out systems where tail latency amplification turns a 1-in-1000 slow response into a near-certainty.
At 100 RPS, almost any architecture works. At 10K RPS, you need horizontal scaling of the app tier and a caching layer that absorbs the majority of reads. At 100K+ RPS, you need CDN offloading for static content, sharded databases for writes, and possibly event streaming (Kafka) to decouple write-heavy paths.
The key insight: throughput requirements determine whether you need horizontal scaling and at which layer.
How to ask in an interview:
"What's the expected peak request volume? Should I design for thousands of RPS (moderate scale) or hundreds of thousands (extreme scale)?"
Consistency determines how fresh the data must be when a user reads it. This is the NFR that most directly constrains your database and replication choices.
The spectrum:
Model
Meaning
Use cases
Strong consistency
Every read returns the most recent write
Inventory counts, bank balances, permission checks
Bounded staleness
Reads may lag by a defined time window
Social feeds (5-second lag), search indexes (30-second lag)
Eventual consistency
Reads will eventually reflect writes, no time guarantee
Like counts, view counters, analytics
What it forces in your architecture:
Strong consistency means single-writer primary for that data, synchronous replication (or consensus like Raft), and no caching of mutable data. It kills multi-region active-active for that data path.
Eventual consistency unlocks read replicas, async replication, caching with TTLs, and multi-region deployments. Almost every high-availability, low-latency architecture relies on eventual consistency for at least some data paths.
How to ask in an interview:
"Is it acceptable for users to see slightly stale data? For a social feed, a 5-second delay is usually fine. For inventory counts or financial transactions, we'd need strong consistency."
Here's the honest answer most candidates miss: consistency is not a system-wide setting. Different data within the same system has different consistency requirements. Tweets can be eventual. Follower counts can be approximate. But a financial transaction ledger must be strongly consistent.
Durability determines how much data loss is acceptable during failures. This is separate from consistency (durability is about writes surviving failures, consistency is about reads reflecting writes).
Async replication with bounded lag, periodic snapshots
Best effort
Some loss is acceptable
In-memory caches with periodic flush, write-behind buffers
What it forces in your architecture:
Zero data loss means synchronous writes to at least two durable storage locations before acknowledging the client. For databases, this means synchronous replication or Raft consensus. For message queues, this means acks=all in Kafka (wait for all in-sync replicas). This adds latency to every write.
If you can tolerate some loss (analytics events, view counts), you can use async replication, write-behind caches, and single-node writes with periodic snapshots. Much faster, but you accept that a node failure loses recent writes.
How to ask in an interview:
"Can any recent writes be lost in a failure scenario? Financial transactions probably need zero loss. Analytics events or view counts might tolerate losing the last few seconds."
Scalability is the system's ability to handle growth in users, data, or traffic without architectural changes. It's less about a specific number and more about the growth trajectory.
The dimensions:
Dimension
Question
Architecture impact
User growth
10x users in 12 months?
Stateless app tier, horizontal scaling, CDN
Data growth
How fast does storage grow?
Sharding strategy, data lifecycle/archival, tiered storage
Traffic spikes
Predictable or bursty?
Auto-scaling, pre-scaling, queue buffering
What it forces in your architecture:
If the system needs to handle 10x growth, every component must be horizontally scalable or have a clear scaling path. Stateless app servers behind a load balancer. Shardable databases. Distributed caches. No single-writer bottlenecks.
If traffic is bursty (flash sales, viral events), you need either pre-scaling (scale up before the event), auto-scaling (react to load), or queue buffering (absorb spikes and process asynchronously).
How to ask in an interview:
"Should I design for current scale or anticipate 10x growth? And is traffic steady or do we expect spikes (events, campaigns, viral content)?"
Here's the part that separates good designers from great ones: NFRs are in tension. Improving one often degrades another. The art of system design is choosing which NFRs to optimize and which to relax.
The key tensions:
Availability vs Consistency (CAP theorem). During a network partition, you choose: reject writes (consistent but unavailable) or accept writes on both sides (available but inconsistent). Every distributed system makes this tradeoff.
Latency vs Durability. Synchronous replication (for durability) adds latency to every write. Async replication is faster but risks data loss. You're trading write speed for write safety.
Consistency vs Latency. Strong consistency requires coordination (consensus rounds, synchronous replication). That coordination adds latency. Eventual consistency lets you read from the nearest replica instantly.
Throughput vs Consistency. Higher throughput often means more replicas and partitions. More replicas means more coordination overhead for strong consistency. At extreme throughput, eventual consistency becomes almost mandatory.
My recommendation: state the tensions explicitly in your interview. "I'm choosing eventual consistency here because it lets me achieve the sub-200ms latency target. If we needed strong consistency, we'd need to accept higher latency or reduce the number of replicas."
Interview tip: name the tradeoff before making the choice
Don't just pick a consistency model. Say: "There's a tension between our 99.99% availability target and strong consistency. During a network partition, I can't have both. For timeline reads, I'm choosing availability and accepting eventual consistency with a 5-second staleness window. For inventory updates, I'd choose consistency and accept brief unavailability." This shows you understand CAP theorem in practice, not just in theory.
After gathering your NFRs, synthesize them into a single sentence that anchors your entire design. This is the sentence you say out loud before drawing your first architecture box.
The template:
"We need [availability target] availability with [latency target] latency for [throughput target] throughput, accepting [consistency model] for [data type], with [durability guarantee] for writes."
Examples:
For a social media timeline:
"We need 99.99% availability with P99 under 200ms for 100K timeline reads/sec, accepting eventual consistency with a 5-second staleness window, with durable writes (no tweet loss once acknowledged)."
For a payment processing system:
"We need 99.99% availability with P99 under 500ms for 10K transactions/sec, requiring strong consistency for account balances, with zero data loss for all financial writes."
For an analytics dashboard:
"We need 99.9% availability with P99 under 2 seconds for 1K queries/sec, accepting eventual consistency with 30-second staleness, tolerating bounded write loss for raw events."
Notice how each sentence produces a completely different architecture. The social media timeline needs a cache-first read path. The payment system needs synchronous replication and consensus. The analytics dashboard can use batch processing and approximate queries.
For your interview: say the NFR sentence out loud before drawing anything. It gives the interviewer a clear rubric to evaluate your architecture against.
This is the nuance that separates senior from staff-level thinking. Most candidates apply NFRs uniformly across the entire system. In reality, different operations within the same system have different NFR profiles.
Example: E-commerce platform
Operation
Availability
Latency
Consistency
Durability
Browse product catalog
99.99%
P99 < 100ms
Eventual (30s stale OK)
N/A (read-only)
Add to cart
99.99%
P99 < 200ms
Session-consistent
Best effort (cart in Redis)
Checkout / payment
99.99%
P99 < 1s
Strong
Zero loss
View order history
99.9%
P99 < 500ms
Eventual (1 min stale OK)
N/A (read-only)
Search products
99.9%
P99 < 300ms
Eventual (minutes stale OK)
N/A (read-only)
Notice that product browsing and checkout have completely different consistency and durability requirements. Designing both with strong consistency wastes resources. Designing both with eventual consistency risks selling items you don't have.
The right approach: identify the 2 to 3 operations with the most demanding NFRs and design your architecture around those. The less demanding operations can ride on simpler paths.
Don't say 'strong consistency everywhere'
When a candidate says "I'll use strong consistency for everything," that's a red flag. It means they're not thinking about tradeoffs. Strong consistency for a like counter is overkill that adds latency and reduces availability for no user-visible benefit. Show you can apply different consistency models to different data types within the same system.
Not quantifying NFRs. "The system should be fast and reliable" is not an NFR specification. Every NFR needs a number: P99 under 200ms, 99.99% availability, 50K RPS peak. Without numbers, your architecture decisions are unjustified.
Treating all NFRs as equally important. Every NFR has a cost. 99.999% availability is 10x more expensive and complex than 99.99%. If you design for five-nines availability on an internal dashboard, you're over-engineering. Prioritize the NFRs that matter most for the specific use case.
Ignoring NFR tensions. If you claim 99.999% availability AND strong consistency AND sub-10ms latency, the interviewer knows you don't understand the tradeoffs. These are in tension. Acknowledge the tension and explain your choice.
Applying NFRs system-wide. As discussed above, different operations need different NFR profiles. Applying the strictest requirement to every operation wastes resources and adds unnecessary complexity.
Not connecting NFRs to architecture decisions. The whole point of NFRs is to justify your design choices. If you say "99.99% availability" but then draw a single-node database, there's a disconnect. Every NFR should map to at least one architecture decision.
Forgetting scalability direction. "It needs to scale" means nothing. Scale what? Reads? Writes? Storage? Users? Each dimension has different solutions. A read-scaling problem (add replicas, caching) is fundamentally different from a write-scaling problem (sharding, partitioning).
NFRs are tested in the first 5 minutes of every system design interview. The interviewer wants to see whether you ask about constraints before designing, and whether you connect those constraints to architecture decisions.
What interviewers evaluate:
Signal
Mid-level
Senior
Staff
Asks about NFRs
Mentions 1-2
Covers all 6 systematically
Identifies per-operation NFR profiles
Quantifies NFRs
Vague ("fast")
Specific ("P99 < 200ms")
Justifies each number from estimation
Connects NFR to architecture
Implicit
Explicit ("Redis because P99 < 50ms")
Traces the full chain: NFR to component to config
Handles tradeoffs
Picks one side
Acknowledges tension
Proposes per-operation tradeoff matrix
Adjusts during interview
Rigid
Adapts when interviewer pushes
Proactively offers: "If we relaxed consistency, we could do X"
Common follow-up questions from interviewers:
Interviewer asks
Strong response
"Why did you choose eventual consistency?"
"The 200ms P99 target wouldn't survive the coordination overhead of strong consistency across replicas. A 5-second staleness window is acceptable for timeline reads since users don't notice sub-5s delays."
"What happens if availability drops below 99.99%?"
"With 99.99%, we budget 52.6 minutes of downtime per year. If we're burning through that budget, we'd investigate: is it a single AZ failure (covered by multi-AZ), a deploy issue (roll back), or a systemic problem (needs multi-region)?"
"Can you make this faster?"
"The main latency contributor is the database read at P99. Options: (1) add a Redis cache for the hot path, reducing P99 from 150ms to 10ms for cache hits, (2) pre-compute the response, eliminating the read entirely, (3) move computation to the edge."
"What if we need strong consistency for this?"
"Strong consistency here means single-writer primary with synchronous replication. Write latency goes from 5ms to 20ms. Read latency stays similar if we route reads to the primary. Availability during partitions drops since we'd reject writes rather than risk inconsistency."
The NFR power move
After stating your NFRs, draw a small table on the whiteboard: NFR on the left, Architecture Decision on the right. Fill it in as you design. "P99 < 200ms maps to Redis cache. 99.99% maps to multi-AZ. 100K RPS maps to horizontal app tier." This table becomes a live design rubric that the interviewer can follow, and it makes your reasoning transparent.
NFRs drive architecture more than features do. The same feature produces completely different architectures depending on latency, consistency, and availability targets.
The 6 core NFRs to clarify in every interview: Availability, Latency, Throughput, Consistency, Durability, and Scalability. Quantify each one.
NFRs trade off against each other. You cannot simultaneously maximize availability, consistency, and latency. Senior designers choose which to optimize and state the tradeoff explicitly.
Use the NFR sentence template before designing: "We need X availability with Y latency for Z throughput, accepting [consistency model]."
NFRs are per-operation, not system-wide. Different operations within the same system have different availability, latency, and consistency requirements.
Connect every NFR to at least one architecture decision. "Redis because P99 < 50ms." "Multi-AZ because 99.99% availability." "Async replication because eventual consistency is acceptable."
State NFR tensions before making choices. "There's a tension between latency and durability here. I'm choosing async replication for speed, accepting bounded data loss of up to 5 seconds."