Non-functional requirements

TL;DR

Non-functional requirements (NFRs) drive architecture more than features do. The same feature ("user views their feed") produces completely different systems depending on the latency, consistency, and availability targets.
The 6 core NFRs to clarify in every interview: Availability, Latency, Throughput, Consistency, Durability, and Scalability. Nail these six and you've covered 90% of architecture-driving constraints.
Every NFR must be quantified. "High availability" is useless. "99.99% availability" tells you exactly what infrastructure you need (multi-AZ, no single points of failure).
NFRs trade off against each other. You cannot have 99.999% availability AND strong consistency AND sub-10ms latency. The magic of senior-level design is choosing which NFRs to optimize and which to relax.
Use the NFR sentence template to anchor your design: "We need X availability with Y latency for Z throughput, accepting [consistency model] for [data type]."

Same Feature, Two Completely Different Architectures

Consider this functional requirement: "Users can view a product's current price."

For an internal analytics dashboard with 50 users, this is a SQL query against a single PostgreSQL instance. Response time of 500ms is fine. If the database is down for 10 minutes during a deploy, nobody panics.

For Amazon's product page with 300 million monthly visitors, this is a globally replicated cache layer backed by a distributed database, served from CDN edge locations, with sub-50ms P99 latency, 99.99% availability, and the ability to handle 500,000 reads per second during Prime Day.

Same feature. Same functional requirement. Completely different systems. The difference? Non-functional requirements.

I've watched candidates design an entire architecture without ever asking about latency, availability, or consistency targets. The architecture looks reasonable in isolation, but the interviewer can't evaluate it because they don't know what constraints the candidate was designing for. Was that Redis cache there for latency reasons? For throughput? The candidate never said.

NFRs are the bridge between "what the system does" and "how the system is built." Without them, your architecture decisions are arbitrary.

The 'reasonable defaults' trap

Don't assume the interviewer has the same defaults you do. "Reasonable latency" might mean 200ms to you and 50ms to them. "High availability" might mean 99.9% to you and 99.99% to them. That 0.09% difference is the difference between multi-AZ and multi-region. Always quantify.

The 6 NFRs That Drive Every Architecture

These are the six non-functional requirements that appear in nearly every system design interview. For each one, I'll cover what it means, concrete thresholds you should know, the architecture changes it forces, and how to phrase the question in an interview.

1. Availability

Availability is the percentage of time the system is operational and serving requests correctly. It's expressed as "nines."

Concrete thresholds:

Target	Downtime per year	What it requires
99.0% (two nines)	87.6 hours	Single server with restart scripts
99.9% (three nines)	8.76 hours	Redundant servers, health checks, auto-restart
99.99% (four nines)	52.6 minutes	Multi-AZ deployment, no single points of failure, automated failover
99.999% (five nines)	5.26 minutes	Multi-region active-active, zero-downtime deploys, chaos engineering

What it forces in your architecture:

99.99% availability means every component must be redundant. No single database. No single load balancer. No single cache node. Every failure domain needs an independent backup. This is where you introduce multi-AZ deployments, read replicas, Sentinel/Cluster Redis, and health-check-driven routing.

99.999% pushes you to multi-region active-active architectures, which introduces consistency challenges (data written in US-East takes time to reach EU-West).

How to ask in an interview:

"What availability target should I design for? Is this a 99.9% SLA (internal tool level) or 99.99% (user-facing product level)?"

I'll often see candidates say "high availability" without a number. That's a missed opportunity. Saying "99.99% availability" and then designing multi-AZ failover to match is what separates a senior answer from a mid-level one.

2. Latency

Latency is the time between a user's request and the system's response. Always talk in percentiles, not averages.

Concrete thresholds:

Target	What it means	Architecture implications
P99 < 10ms	Data must be in RAM or at the edge	In-process cache, CDN edge compute, pre-computed responses
P99 < 50ms	Fast data store or nearby cache	Redis, read replicas in the same AZ, connection pooling
P99 < 200ms	Standard web API budget	Allows 2-3 sequential service calls, database reads with indexes
P99 < 1000ms	Batch-tolerant operations	Complex queries, cross-region reads, multiple service hops
P99 > 1s	Background/async acceptable	Async job queues, email delivery, report generation

What it forces in your architecture:

Sub-50ms P99 means you can't afford a cache miss hitting a slow database. You need a cache-first read path with high hit rates (95%+). It also means no cross-region round trips (those add 50-150ms alone).

Sub-200ms is the sweet spot for most user-facing APIs. You get a budget of 2-3 sequential I/O operations. This is where careful service decomposition matters: every additional service hop eats into your latency budget.

How to ask in an interview:

"What's the acceptable response time for this operation? Should I target sub-100ms (aggressive, cache-first) or is 200-500ms acceptable?"

Interview tip: always say P99, never say average

Averages hide tail latency. A system with 50ms average and 5-second P99 has a terrible user experience for 1% of requests. Mentioning P99 (or P95) shows you understand real-world performance characteristics. If you want to show even more depth, mention that P99.9 matters for high-fan-out systems where tail latency amplification turns a 1-in-1000 slow response into a near-certainty.

3. Throughput

Throughput is the volume of requests the system handles per unit of time. Usually expressed as requests per second (RPS) or events per second.

Concrete thresholds:

Scale	RPS range	Architecture approach
Small	< 100 RPS	Single server handles this. Don't over-engineer.
Medium	100-10K RPS	Standard three-tier app with connection pooling, indexed queries
High	10K-100K RPS	Dedicated caching layer, horizontal app servers, sharded writes
Extreme	100K+ RPS	CDN offload, distributed cache, sharded database, event streaming

What it forces in your architecture:

At 100 RPS, almost any architecture works. At 10K RPS, you need horizontal scaling of the app tier and a caching layer that absorbs the majority of reads. At 100K+ RPS, you need CDN offloading for static content, sharded databases for writes, and possibly event streaming (Kafka) to decouple write-heavy paths.

The key insight: throughput requirements determine whether you need horizontal scaling and at which layer.

How to ask in an interview:

"What's the expected peak request volume? Should I design for thousands of RPS (moderate scale) or hundreds of thousands (extreme scale)?"

4. Consistency

Consistency determines how fresh the data must be when a user reads it. This is the NFR that most directly constrains your database and replication choices.

The spectrum:

Model	Meaning	Use cases
Strong consistency	Every read returns the most recent write	Inventory counts, bank balances, permission checks
Bounded staleness	Reads may lag by a defined time window	Social feeds (5-second lag), search indexes (30-second lag)
Eventual consistency	Reads will eventually reflect writes, no time guarantee	Like counts, view counters, analytics

What it forces in your architecture:

Strong consistency means single-writer primary for that data, synchronous replication (or consensus like Raft), and no caching of mutable data. It kills multi-region active-active for that data path.

Eventual consistency unlocks read replicas, async replication, caching with TTLs, and multi-region deployments. Almost every high-availability, low-latency architecture relies on eventual consistency for at least some data paths.

How to ask in an interview:

"Is it acceptable for users to see slightly stale data? For a social feed, a 5-second delay is usually fine. For inventory counts or financial transactions, we'd need strong consistency."

Here's the honest answer most candidates miss: consistency is not a system-wide setting. Different data within the same system has different consistency requirements. Tweets can be eventual. Follower counts can be approximate. But a financial transaction ledger must be strongly consistent.

5. Durability

Durability determines how much data loss is acceptable during failures. This is separate from consistency (durability is about writes surviving failures, consistency is about reads reflecting writes).

The spectrum:

Target	Meaning	Implementation
Zero loss	No acknowledged write can ever be lost	Synchronous replication, WAL, durable message queues (Kafka acks=all)
Bounded loss	Up to N seconds of writes can be lost	Async replication with bounded lag, periodic snapshots
Best effort	Some loss is acceptable	In-memory caches with periodic flush, write-behind buffers

What it forces in your architecture:

Zero data loss means synchronous writes to at least two durable storage locations before acknowledging the client. For databases, this means synchronous replication or Raft consensus. For message queues, this means acks=all in Kafka (wait for all in-sync replicas). This adds latency to every write.

If you can tolerate some loss (analytics events, view counts), you can use async replication, write-behind caches, and single-node writes with periodic snapshots. Much faster, but you accept that a node failure loses recent writes.

How to ask in an interview:

"Can any recent writes be lost in a failure scenario? Financial transactions probably need zero loss. Analytics events or view counts might tolerate losing the last few seconds."

6. Scalability

Scalability is the system's ability to handle growth in users, data, or traffic without architectural changes. It's less about a specific number and more about the growth trajectory.

The dimensions:

Dimension	Question	Architecture impact
User growth	10x users in 12 months?	Stateless app tier, horizontal scaling, CDN
Data growth	How fast does storage grow?	Sharding strategy, data lifecycle/archival, tiered storage
Traffic spikes	Predictable or bursty?	Auto-scaling, pre-scaling, queue buffering

What it forces in your architecture:

If the system needs to handle 10x growth, every component must be horizontally scalable or have a clear scaling path. Stateless app servers behind a load balancer. Shardable databases. Distributed caches. No single-writer bottlenecks.

If traffic is bursty (flash sales, viral events), you need either pre-scaling (scale up before the event), auto-scaling (react to load), or queue buffering (absorb spikes and process asynchronously).

How to ask in an interview:

"Should I design for current scale or anticipate 10x growth? And is traffic steady or do we expect spikes (events, campaigns, viral content)?"

How NFRs Trade Off Against Each Other

Here's the part that separates good designers from great ones: NFRs are in tension. Improving one often degrades another. The art of system design is choosing which NFRs to optimize and which to relax.

The key tensions:

Availability vs Consistency (CAP theorem). During a network partition, you choose: reject writes (consistent but unavailable) or accept writes on both sides (available but inconsistent). Every distributed system makes this tradeoff.

Latency vs Durability. Synchronous replication (for durability) adds latency to every write. Async replication is faster but risks data loss. You're trading write speed for write safety.

Consistency vs Latency. Strong consistency requires coordination (consensus rounds, synchronous replication). That coordination adds latency. Eventual consistency lets you read from the nearest replica instantly.

Throughput vs Consistency. Higher throughput often means more replicas and partitions. More replicas means more coordination overhead for strong consistency. At extreme throughput, eventual consistency becomes almost mandatory.

My recommendation: state the tensions explicitly in your interview. "I'm choosing eventual consistency here because it lets me achieve the sub-200ms latency target. If we needed strong consistency, we'd need to accept higher latency or reduce the number of replicas."

Interview tip: name the tradeoff before making the choice

Don't just pick a consistency model. Say: "There's a tension between our 99.99% availability target and strong consistency. During a network partition, I can't have both. For timeline reads, I'm choosing availability and accepting eventual consistency with a 5-second staleness window. For inventory updates, I'd choose consistency and accept brief unavailability." This shows you understand CAP theorem in practice, not just in theory.

The NFR Sentence Template

After gathering your NFRs, synthesize them into a single sentence that anchors your entire design. This is the sentence you say out loud before drawing your first architecture box.

The template:

"We need [availability target] availability with [latency target] latency for [throughput target] throughput, accepting [consistency model] for [data type], with [durability guarantee] for writes."

Examples:

For a social media timeline:

"We need 99.99% availability with P99 under 200ms for 100K timeline reads/sec, accepting eventual consistency with a 5-second staleness window, with durable writes (no tweet loss once acknowledged)."

For a payment processing system:

"We need 99.99% availability with P99 under 500ms for 10K transactions/sec, requiring strong consistency for account balances, with zero data loss for all financial writes."

For an analytics dashboard:

"We need 99.9% availability with P99 under 2 seconds for 1K queries/sec, accepting eventual consistency with 30-second staleness, tolerating bounded write loss for raw events."

Notice how each sentence produces a completely different architecture. The social media timeline needs a cache-first read path. The payment system needs synchronous replication and consensus. The analytics dashboard can use batch processing and approximate queries.

For your interview: say the NFR sentence out loud before drawing anything. It gives the interviewer a clear rubric to evaluate your architecture against.

NFRs Are Per-Operation, Not System-Wide

This is the nuance that separates senior from staff-level thinking. Most candidates apply NFRs uniformly across the entire system. In reality, different operations within the same system have different NFR profiles.

Example: E-commerce platform

Operation	Availability	Latency	Consistency	Durability
Browse product catalog	99.99%	P99 < 100ms	Eventual (30s stale OK)	N/A (read-only)
Add to cart	99.99%	P99 < 200ms	Session-consistent	Best effort (cart in Redis)
Checkout / payment	99.99%	P99 < 1s	Strong	Zero loss
View order history	99.9%	P99 < 500ms	Eventual (1 min stale OK)	N/A (read-only)
Search products	99.9%	P99 < 300ms	Eventual (minutes stale OK)	N/A (read-only)

Notice that product browsing and checkout have completely different consistency and durability requirements. Designing both with strong consistency wastes resources. Designing both with eventual consistency risks selling items you don't have.

The right approach: identify the 2 to 3 operations with the most demanding NFRs and design your architecture around those. The less demanding operations can ride on simpler paths.

Don't say 'strong consistency everywhere'

When a candidate says "I'll use strong consistency for everything," that's a red flag. It means they're not thinking about tradeoffs. Strong consistency for a like counter is overkill that adds latency and reduces availability for no user-visible benefit. Show you can apply different consistency models to different data types within the same system.

Common Mistakes

Not quantifying NFRs. "The system should be fast and reliable" is not an NFR specification. Every NFR needs a number: P99 under 200ms, 99.99% availability, 50K RPS peak. Without numbers, your architecture decisions are unjustified.

Treating all NFRs as equally important. Every NFR has a cost. 99.999% availability is 10x more expensive and complex than 99.99%. If you design for five-nines availability on an internal dashboard, you're over-engineering. Prioritize the NFRs that matter most for the specific use case.

Ignoring NFR tensions. If you claim 99.999% availability AND strong consistency AND sub-10ms latency, the interviewer knows you don't understand the tradeoffs. These are in tension. Acknowledge the tension and explain your choice.

Applying NFRs system-wide. As discussed above, different operations need different NFR profiles. Applying the strictest requirement to every operation wastes resources and adds unnecessary complexity.

Not connecting NFRs to architecture decisions. The whole point of NFRs is to justify your design choices. If you say "99.99% availability" but then draw a single-node database, there's a disconnect. Every NFR should map to at least one architecture decision.

Forgetting scalability direction. "It needs to scale" means nothing. Scale what? Reads? Writes? Storage? Users? Each dimension has different solutions. A read-scaling problem (add replicas, caching) is fundamentally different from a write-scaling problem (sharding, partitioning).

How This Shows Up in Interviews

NFRs are tested in the first 5 minutes of every system design interview. The interviewer wants to see whether you ask about constraints before designing, and whether you connect those constraints to architecture decisions.

What interviewers evaluate:

Signal	Mid-level	Senior	Staff
Asks about NFRs	Mentions 1-2	Covers all 6 systematically	Identifies per-operation NFR profiles
Quantifies NFRs	Vague ("fast")	Specific ("P99 < 200ms")	Justifies each number from estimation
Connects NFR to architecture	Implicit	Explicit ("Redis because P99 < 50ms")	Traces the full chain: NFR to component to config
Handles tradeoffs	Picks one side	Acknowledges tension	Proposes per-operation tradeoff matrix
Adjusts during interview	Rigid	Adapts when interviewer pushes	Proactively offers: "If we relaxed consistency, we could do X"

Common follow-up questions from interviewers:

Interviewer asks	Strong response
"Why did you choose eventual consistency?"	"The 200ms P99 target wouldn't survive the coordination overhead of strong consistency across replicas. A 5-second staleness window is acceptable for timeline reads since users don't notice sub-5s delays."
"What happens if availability drops below 99.99%?"	"With 99.99%, we budget 52.6 minutes of downtime per year. If we're burning through that budget, we'd investigate: is it a single AZ failure (covered by multi-AZ), a deploy issue (roll back), or a systemic problem (needs multi-region)?"
"Can you make this faster?"	"The main latency contributor is the database read at P99. Options: (1) add a Redis cache for the hot path, reducing P99 from 150ms to 10ms for cache hits, (2) pre-compute the response, eliminating the read entirely, (3) move computation to the edge."
"What if we need strong consistency for this?"	"Strong consistency here means single-writer primary with synchronous replication. Write latency goes from 5ms to 20ms. Read latency stays similar if we route reads to the primary. Availability during partitions drops since we'd reject writes rather than risk inconsistency."

The NFR power move

After stating your NFRs, draw a small table on the whiteboard: NFR on the left, Architecture Decision on the right. Fill it in as you design. "P99 < 200ms maps to Redis cache. 99.99% maps to multi-AZ. 100K RPS maps to horizontal app tier." This table becomes a live design rubric that the interviewer can follow, and it makes your reasoning transparent.

Quick Recap

NFRs drive architecture more than features do. The same feature produces completely different architectures depending on latency, consistency, and availability targets.
The 6 core NFRs to clarify in every interview: Availability, Latency, Throughput, Consistency, Durability, and Scalability. Quantify each one.
NFRs trade off against each other. You cannot simultaneously maximize availability, consistency, and latency. Senior designers choose which to optimize and state the tradeoff explicitly.
Use the NFR sentence template before designing: "We need X availability with Y latency for Z throughput, accepting [consistency model]."
NFRs are per-operation, not system-wide. Different operations within the same system have different availability, latency, and consistency requirements.
Connect every NFR to at least one architecture decision. "Redis because P99 < 50ms." "Multi-AZ because 99.99% availability." "Async replication because eventual consistency is acceptable."
State NFR tensions before making choices. "There's a tension between latency and durability here. I'm choosing async replication for speed, accepting bounded data loss of up to 5 seconds."

TL;DR

Non-functional requirements (NFRs) drive architecture more than features do. The same feature ("user views their feed") produces completely different systems depending on the latency, consistency, and availability targets.
The 6 core NFRs to clarify in every interview: Availability, Latency, Throughput, Consistency, Durability, and Scalability. Nail these six and you've covered 90% of architecture-driving constraints.
Every NFR must be quantified. "High availability" is useless. "99.99% availability" tells you exactly what infrastructure you need (multi-AZ, no single points of failure).
NFRs trade off against each other. You cannot have 99.999% availability AND strong consistency AND sub-10ms latency. The magic of senior-level design is choosing which NFRs to optimize and which to relax.
Use the NFR sentence template to anchor your design: "We need X availability with Y latency for Z throughput, accepting [consistency model] for [data type]."

Same Feature, Two Completely Different Architectures

Consider this functional requirement: "Users can view a product's current price."

Same feature. Same functional requirement. Completely different systems. The difference? Non-functional requirements.

NFRs are the bridge between "what the system does" and "how the system is built." Without them, your architecture decisions are arbitrary.

The 'reasonable defaults' trap

The 6 NFRs That Drive Every Architecture

1. Availability

Availability is the percentage of time the system is operational and serving requests correctly. It's expressed as "nines."

Concrete thresholds:

Target	Downtime per year	What it requires
99.0% (two nines)	87.6 hours	Single server with restart scripts
99.9% (three nines)	8.76 hours	Redundant servers, health checks, auto-restart
99.99% (four nines)	52.6 minutes	Multi-AZ deployment, no single points of failure, automated failover
99.999% (five nines)	5.26 minutes	Multi-region active-active, zero-downtime deploys, chaos engineering

What it forces in your architecture:

99.999% pushes you to multi-region active-active architectures, which introduces consistency challenges (data written in US-East takes time to reach EU-West).

How to ask in an interview:

"What availability target should I design for? Is this a 99.9% SLA (internal tool level) or 99.99% (user-facing product level)?"

2. Latency

Latency is the time between a user's request and the system's response. Always talk in percentiles, not averages.

Concrete thresholds:

Target	What it means	Architecture implications
P99 < 10ms	Data must be in RAM or at the edge	In-process cache, CDN edge compute, pre-computed responses
P99 < 50ms	Fast data store or nearby cache	Redis, read replicas in the same AZ, connection pooling
P99 < 200ms	Standard web API budget	Allows 2-3 sequential service calls, database reads with indexes
P99 < 1000ms	Batch-tolerant operations	Complex queries, cross-region reads, multiple service hops
P99 > 1s	Background/async acceptable	Async job queues, email delivery, report generation

What it forces in your architecture:

How to ask in an interview:

"What's the acceptable response time for this operation? Should I target sub-100ms (aggressive, cache-first) or is 200-500ms acceptable?"

Interview tip: always say P99, never say average

3. Throughput

Throughput is the volume of requests the system handles per unit of time. Usually expressed as requests per second (RPS) or events per second.

Concrete thresholds:

Scale	RPS range	Architecture approach
Small	< 100 RPS	Single server handles this. Don't over-engineer.
Medium	100-10K RPS	Standard three-tier app with connection pooling, indexed queries
High	10K-100K RPS	Dedicated caching layer, horizontal app servers, sharded writes
Extreme	100K+ RPS	CDN offload, distributed cache, sharded database, event streaming

What it forces in your architecture:

The key insight: throughput requirements determine whether you need horizontal scaling and at which layer.

How to ask in an interview:

"What's the expected peak request volume? Should I design for thousands of RPS (moderate scale) or hundreds of thousands (extreme scale)?"

4. Consistency

Consistency determines how fresh the data must be when a user reads it. This is the NFR that most directly constrains your database and replication choices.

The spectrum:

Model	Meaning	Use cases
Strong consistency	Every read returns the most recent write	Inventory counts, bank balances, permission checks
Bounded staleness	Reads may lag by a defined time window	Social feeds (5-second lag), search indexes (30-second lag)
Eventual consistency	Reads will eventually reflect writes, no time guarantee	Like counts, view counters, analytics

What it forces in your architecture:

Strong consistency means single-writer primary for that data, synchronous replication (or consensus like Raft), and no caching of mutable data. It kills multi-region active-active for that data path.

How to ask in an interview:

"Is it acceptable for users to see slightly stale data? For a social feed, a 5-second delay is usually fine. For inventory counts or financial transactions, we'd need strong consistency."

5. Durability

The spectrum:

Target	Meaning	Implementation
Zero loss	No acknowledged write can ever be lost	Synchronous replication, WAL, durable message queues (Kafka acks=all)
Bounded loss	Up to N seconds of writes can be lost	Async replication with bounded lag, periodic snapshots
Best effort	Some loss is acceptable	In-memory caches with periodic flush, write-behind buffers

What it forces in your architecture:

How to ask in an interview:

"Can any recent writes be lost in a failure scenario? Financial transactions probably need zero loss. Analytics events or view counts might tolerate losing the last few seconds."

6. Scalability

Scalability is the system's ability to handle growth in users, data, or traffic without architectural changes. It's less about a specific number and more about the growth trajectory.

The dimensions:

Dimension	Question	Architecture impact
User growth	10x users in 12 months?	Stateless app tier, horizontal scaling, CDN
Data growth	How fast does storage grow?	Sharding strategy, data lifecycle/archival, tiered storage
Traffic spikes	Predictable or bursty?	Auto-scaling, pre-scaling, queue buffering

What it forces in your architecture:

If traffic is bursty (flash sales, viral events), you need either pre-scaling (scale up before the event), auto-scaling (react to load), or queue buffering (absorb spikes and process asynchronously).

How to ask in an interview:

"Should I design for current scale or anticipate 10x growth? And is traffic steady or do we expect spikes (events, campaigns, viral content)?"

How NFRs Trade Off Against Each Other

The key tensions:

Latency vs Durability. Synchronous replication (for durability) adds latency to every write. Async replication is faster but risks data loss. You're trading write speed for write safety.

Interview tip: name the tradeoff before making the choice

The NFR Sentence Template

After gathering your NFRs, synthesize them into a single sentence that anchors your entire design. This is the sentence you say out loud before drawing your first architecture box.

The template:

"We need [availability target] availability with [latency target] latency for [throughput target] throughput, accepting [consistency model] for [data type], with [durability guarantee] for writes."

Examples:

For a social media timeline:

"We need 99.99% availability with P99 under 200ms for 100K timeline reads/sec, accepting eventual consistency with a 5-second staleness window, with durable writes (no tweet loss once acknowledged)."

For a payment processing system:

"We need 99.99% availability with P99 under 500ms for 10K transactions/sec, requiring strong consistency for account balances, with zero data loss for all financial writes."

For an analytics dashboard:

"We need 99.9% availability with P99 under 2 seconds for 1K queries/sec, accepting eventual consistency with 30-second staleness, tolerating bounded write loss for raw events."

For your interview: say the NFR sentence out loud before drawing anything. It gives the interviewer a clear rubric to evaluate your architecture against.

NFRs Are Per-Operation, Not System-Wide

Example: E-commerce platform

Operation	Availability	Latency	Consistency	Durability
Browse product catalog	99.99%	P99 < 100ms	Eventual (30s stale OK)	N/A (read-only)
Add to cart	99.99%	P99 < 200ms	Session-consistent	Best effort (cart in Redis)
Checkout / payment	99.99%	P99 < 1s	Strong	Zero loss
View order history	99.9%	P99 < 500ms	Eventual (1 min stale OK)	N/A (read-only)
Search products	99.9%	P99 < 300ms	Eventual (minutes stale OK)	N/A (read-only)

The right approach: identify the 2 to 3 operations with the most demanding NFRs and design your architecture around those. The less demanding operations can ride on simpler paths.

Don't say 'strong consistency everywhere'

Common Mistakes

How This Shows Up in Interviews

What interviewers evaluate:

Signal	Mid-level	Senior	Staff
Asks about NFRs	Mentions 1-2	Covers all 6 systematically	Identifies per-operation NFR profiles
Quantifies NFRs	Vague ("fast")	Specific ("P99 < 200ms")	Justifies each number from estimation
Connects NFR to architecture	Implicit	Explicit ("Redis because P99 < 50ms")	Traces the full chain: NFR to component to config
Handles tradeoffs	Picks one side	Acknowledges tension	Proposes per-operation tradeoff matrix
Adjusts during interview	Rigid	Adapts when interviewer pushes	Proactively offers: "If we relaxed consistency, we could do X"

Common follow-up questions from interviewers:

Interviewer asks	Strong response
"Why did you choose eventual consistency?"	"The 200ms P99 target wouldn't survive the coordination overhead of strong consistency across replicas. A 5-second staleness window is acceptable for timeline reads since users don't notice sub-5s delays."
"What happens if availability drops below 99.99%?"	"With 99.99%, we budget 52.6 minutes of downtime per year. If we're burning through that budget, we'd investigate: is it a single AZ failure (covered by multi-AZ), a deploy issue (roll back), or a systemic problem (needs multi-region)?"
"Can you make this faster?"	"The main latency contributor is the database read at P99. Options: (1) add a Redis cache for the hot path, reducing P99 from 150ms to 10ms for cache hits, (2) pre-compute the response, eliminating the read entirely, (3) move computation to the edge."
"What if we need strong consistency for this?"	"Strong consistency here means single-writer primary with synchronous replication. Write latency goes from 5ms to 20ms. Read latency stays similar if we route reads to the primary. Availability during partitions drops since we'd reject writes rather than risk inconsistency."

The NFR power move

Quick Recap

NFRs drive architecture more than features do. The same feature produces completely different architectures depending on latency, consistency, and availability targets.
The 6 core NFRs to clarify in every interview: Availability, Latency, Throughput, Consistency, Durability, and Scalability. Quantify each one.
NFRs trade off against each other. You cannot simultaneously maximize availability, consistency, and latency. Senior designers choose which to optimize and state the tradeoff explicitly.
Use the NFR sentence template before designing: "We need X availability with Y latency for Z throughput, accepting [consistency model]."
NFRs are per-operation, not system-wide. Different operations within the same system have different availability, latency, and consistency requirements.
Connect every NFR to at least one architecture decision. "Redis because P99 < 50ms." "Multi-AZ because 99.99% availability." "Async replication because eventual consistency is acceptable."
State NFR tensions before making choices. "There's a tension between latency and durability here. I'm choosing async replication for speed, accepting bounded data loss of up to 5 seconds."

Non-functional requirements

TL;DR

Same Feature, Two Completely Different Architectures

The 6 NFRs That Drive Every Architecture

1. Availability

2. Latency

3. Throughput

4. Consistency

5. Durability

6. Scalability

How NFRs Trade Off Against Each Other

The NFR Sentence Template

NFRs Are Per-Operation, Not System-Wide

Common Mistakes

How This Shows Up in Interviews

Quick Recap

Comments

Non-functional requirements

TL;DR

Same Feature, Two Completely Different Architectures

The 6 NFRs That Drive Every Architecture

1. Availability

2. Latency

3. Throughput

4. Consistency

5. Durability

6. Scalability

How NFRs Trade Off Against Each Other

The NFR Sentence Template

NFRs Are Per-Operation, Not System-Wide

Common Mistakes

How This Shows Up in Interviews

Quick Recap

Comments