System design vocabulary: 100 terms, precisely defined
The 100 most important system design terms defined with precision. Use this to communicate accurately in interviews and code reviews — knowing exactly what 'partition tolerance' means vs. 'availability', or 'durability' vs. 'consistency'.
Why Precise Vocabulary Matters
"The database is consistent" means nothing. Consistent how? Strongly consistent? Eventually consistent? Sequentially consistent? These are distinct guarantees with very different implications.
Imprecise vocabulary in system design interviews signals that you understand the concept conceptually but haven't worked deeply with the details. Interviewers at staff+ level use precise vocabulary and expect the same in return.
Core Reliability Terms
Availability: The percentage of time a system is operational and responding to requests. 99.9% = ~8.7h downtime/year; 99.99% = ~52 minutes; 99.999% = ~5 minutes.
Durability: Once a write is acknowledged, the data is not lost — not even if nodes fail. Availability = the system is up. Durability = your data survives.
Reliability: The probability that a system performs correctly for a given time period. Usually expressed as MTBF (mean time between failures).
Fault tolerance: The system continues operating correctly despite component failures. Achieved by redundancy — no single point of failure.
Resilience: The ability to recover from failures, not just avoid them. Resilient systems degrade gracefully and recover quickly.
MTBF (Mean Time Between Failures): Average time between failures.
MTTR (Mean Time to Repair): Average time to restore service after a failure. Lower MTTR = higher availability.
SLA (Service Level Agreement): Contractual commitment to a customer on availability/performance.
SLO (Service Level Objective): Internal target for availability/performance. The SLA is derived from SLOs.
SLI (Service Level Indicator): The actual measured metric (request success rate, latency p99).
Error budget: 100% minus your SLO availability. 99.9% SLO = 0.1% error budget = ~43 minutes downtime/month. Teams spend this budget on risky deployments; when it's burned, slow down releases.
Consistency Terms
Strong consistency: Every read returns the most recently written value. After a write completes, all subsequent reads return that value.
Eventual consistency: If writes stop, all replicas will converge to the same value eventually. Reads may return stale values in the short term.
Causal consistency: Reads reflect all writes that causally preceded them. If you posted a comment (write A) and then read your post (read B), B reflects A. Weaker than strong, stronger than eventual.
Read-your-writes consistency: After writing a value, you always read back at least that value. Relaxed causal — only guarantees your own writes are visible to you.
Monotonic reads: Once you read a value at time T, you will never read a value older than T. Prevents "time travel" where reads appear to go backwards.
Linearizability: The strongest correctness guarantee. Every operation appears to take effect instantaneously at some point between its invocation and response. Equivalent to having a single global clock.
Serializability: Concurrent transactions produce a result equivalent to some serial execution. The consistency guarantee for multi-object transaction systems.
Isolation level: How much one transaction is exposed to another's in-progress changes. From weakest to strongest: read uncommitted → read committed → repeatable read → serializable.
Scalability Terms
Vertical scaling (scale up): Adding more CPU/RAM/disk to an existing server.
Horizontal scaling (scale out): Adding more servers.
Throughput: Number of requests or transactions processed per second.
Latency: Time to process a single request. p50 = median; p99 = 99th percentile latency; p999 = 99.9th percentile. Always talk in percentiles, not averages.
Bandwidth: Maximum data transfer rate of a network link.
Bottleneck: The component whose capacity limits overall throughput. Adding capacity elsewhere doesn't help.
Sharding (horizontal partitioning): Splitting a dataset across multiple nodes, each owning a subset. Enables write scaling beyond a single node's capacity.
Replication: Maintaining multiple copies of data for fault tolerance or read scaling.
Read replica: A copy of a database that serves read traffic, while the primary handles writes.
Primary-secondary (leader-follower) replication: One node accepts writes; others replicate and serve reads.
Multi-master replication: Multiple nodes accept writes; conflict resolution is required.
Fanout: A write or notification that distributes to many recipients. Fan-out write = push to all at write time. Fan-out read = pull from all at read time.
Data Storage Terms
ACID: Atomicity (all or nothing), Consistency (constraints hold), Isolation (transactions don't interfere), Durability (committed writes survive).
BASE: Basically Available, Soft state, Eventually consistent. Contrast with ACID.
WAL (Write-Ahead Log): Changes are logged before being applied to data files. Enables crash recovery and replication.
B-tree index: The standard database index structure. O(log n) reads and writes. Efficient for range queries.
LSM tree (Log-Structured Merge-tree): Write-optimized storage. Appends to an in-memory buffer (memtable), flushes sorted immutable files (SSTables). Used in LevelDB, RocksDB, Cassandra.
SSTable: Sorted String Table. An immutable, sorted file segment used in LSM-tree storage.
Compaction: Background process that merges and rewrites SSTables in LSM-tree storage to reclaim space and reduce read amplification.
Write amplification: Each logical write results in multiple physical writes (e.g., to WAL, to data file, to index). High write amplification degrades write throughput.
Read amplification: Each logical read requires multiple physical reads (e.g., checking multiple SSTable levels). High read amplification degrades read throughput.
MVCC (Multi-Version Concurrency Control): Multiple versions of rows are maintained. Readers see a consistent snapshot without blocking writers.
Network Terms
Partition tolerance (CAP theorem): The system continues to operate when network messages are lost or delayed between nodes.
Network partition: A network failure that splits the cluster into two groups that cannot communicate with each other.
RTT (Round-Trip Time): Time for a packet to travel from sender to receiver and back. Typical values: same datacenter < 1ms; cross-continent ~100ms; satellite ~500ms.
TCP: Reliable, ordered, connection-oriented protocol. Guarantees delivery and ordering at the cost of latency overhead.
UDP: Unreliable, unordered, connectionless. Lower latency; no delivery guarantee.
mTLS (Mutual TLS): Both client and server authenticate using certificates. Used for service-to-service authentication.
Idempotent: An operation that produces the same result regardless of how many times it is applied. PUT /resource/123 is idempotent; POST /items is typically not.
Exactly-once: Each message or operation is processed exactly once. Hard to achieve; at-least-once (easier) + idempotent processing = effectively exactly-once.
Load Distribution Terms
Load balancer: Distributes incoming requests across multiple backend instances.
Sticky sessions (session affinity): Routes all requests from one user to the same backend. Necessary for stateful applications; avoid for stateless ones.
Round robin: Each request goes to the next server in sequence. Simple; ignores server load.
Least connections: Routes each request to the server with fewest active connections. Better for varying request durations.
Consistent hashing: Distributes requests using a hash ring so that adding/removing nodes affects a minimal number of keys.
Health check: Periodic ping to backends; unhealthy backends are removed from rotation.
Caching Terms
Cache hit: Requested data is found in cache.
Cache miss: Data is not in cache; must be fetched from source.
Hit rate: hits / (hits + misses). A hit rate below ~90% means most requests still go to the database.
Eviction policy: Rules for which entries are removed when the cache is full. LRU (least recently used), LFU (least frequently used), TTL expiry.
Cache stampede (thundering herd): A popular cache key expires and many concurrent requests all miss and simultaneously query the database.
TTL (Time to Live): How long a cache entry is valid before expiring.
Cache-aside: Application code checks cache on read; updates DB then invalidates cache on write.
Write-through: Writes go to cache and synchronously to database.
Write-behind: Writes go to cache; database write is deferred and async.
Quick Recap
- Availability (is the system up?) is distinct from durability (is the data safe?). You can have high availability with low durability (data in memory only) or high durability with planned downtime.
- Strong consistency, eventual consistency, causal consistency, and linearizability are distinct guarantees — "consistent" alone is meaningless. Know the difference: linearizability = single global timeline; causal = your writes visible to you; eventual = converges "eventually."
- Throughput (requests per second) and latency (time per request) are orthogonal. High throughput doesn't mean low latency. Always specify latency as percentiles (p99, p999) — averages hide the tail.
- ACID vs. BASE: transactions with ACID guarantees give you strong consistency; BASE (Basically Available, Soft state, Eventually consistent) trades consistency for availability and partition tolerance in distributed systems.
- Idempotency is a system-design primitive. At-least-once delivery + idempotent operations = effectively exactly-once semantics. Design your write handlers to be idempotent (by deduplicating on a request ID) whenever possible.