Networking
Master the networking protocols, load balancing strategies, and failure-handling patterns that underpin every system design interview β from TCP vs UDP to L4 vs L7 load balancers.
TL;DR
- Networking is the connective tissue of every distributed system. Every service you draw on a whiteboard needs to talk to other services β and how they talk determines your latency, reliability, and scalability ceiling.
- Three layers matter for interviews: IP handles addressing and routing, TCP/UDP handle reliable (or fast) delivery, and application protocols (HTTP, WebSockets, gRPC) define how your services exchange data.
- TCP is your default. Use UDP only when you can tolerate packet loss and need minimal latency (live video, gaming). QUIC is the modern upgrade path β mention it to impress, default to TCP.
- REST is your default API. Reach for GraphQL when flexible client queries matter, gRPC when internal service throughput is critical. For real-time push, SSE covers most use cases; WebSockets when you need bidirectional; WebRTC only for peer-to-peer audio/video.
- Load balancers distribute traffic and detect failures. L7 for HTTP traffic, L4 for WebSocket or raw TCP. Client-side load balancing for internal microservices. Always mention health checks.
- Networks fail. Retries with exponential backoff, idempotency keys, and circuit breakers are how you survive it β and every senior interviewer expects you to know them.
The Problem It Solves
Your microservices architecture looks perfect on the whiteboard. Five services, clean arrows, a database behind each one. You deploy to production and everything works β for about three hours.
Then the order service tries to call the payment service. The payment service is overloaded and takes 30 seconds to respond. The order service holds the connection open, eating one of its 200 threads.
More orders come in, each blocking on the payment service. Within four minutes, the order service has exhausted its thread pool β and now the API gateway can't reach the order service either. Users see 503s across the board.
Nothing crashed. No server died. The network between two services just got slow β and that slowness cascaded through your entire system because nobody thought about how services actually communicate.
I see this pattern in almost every first-attempt system design. Engineers draw boxes and arrows but treat the arrows as magic β instant, reliable, free. They're not.
Every arrow is a network call with latency, a protocol with overhead, and a connection that can fail. Understanding networking transforms those arrows from handwaving into deliberate engineering decisions.
The network is not reliable β and your design must prove you know that
The eight fallacies of distributed computing start with "the network is reliable." In interviews, the difference between a mid-level and senior answer is whether you treat network calls as infallible or design around their failure. Every service-to-service arrow on your whiteboard needs a timeout, a retry strategy, and a plan for when it fails.
The arrows between your services carry your system's entire communication burden β getting them right is the difference between a resilient architecture and a house of cards.
What Is It?
Networking is how independent machines exchange data across physical and virtual connections. In system design, it's the set of protocols, patterns, and infrastructure that determine how your services discover each other, communicate, handle failures, and scale.
Analogy: Think of a large hospital. Doctors, nurses, pharmacists, and lab technicians are all specialists (services) who need to coordinate patient care. They don't all stand in the same room shouting β they use pagers (UDP), phone calls (TCP), written orders on clipboards (HTTP), and real-time intercoms during surgery (WebSockets).
The pager is fast but you might miss a page, while the phone call guarantees you reach someone but takes time to dial. The clipboard order creates a paper trail but is slow. During surgery, you need immediate two-way communication β nothing else will do.
Networking in system design is about choosing the right communication channel for each interaction, understanding the cost of each choice, and designing for what happens when the channel breaks.
For your interview: know these three layers and what each one gives you. The network layer handles addressing (IP), the transport layer handles reliability (TCP) or speed (UDP), and the application layer handles your business logic protocol (HTTP, WebSockets, gRPC). Everything else is implementation detail you can skip unless asked.
How It Works
Let's trace a single web request end-to-end. When you type example.com into your browser, a carefully orchestrated sequence of protocol interactions unfolds across all three layers. Understanding this flow is the foundation for every networking decision in system design.
sequenceDiagram
participant B as π€ Browser
participant DNS as π DNS Resolver
participant S as βοΈ Web Server
Note over B,S: Step 1: DNS Resolution
B->>DNS: What IP is example.com?
DNS-->>B: 93.184.216.34
Note over B,S: Step 2: TCP 3-Way Handshake
B->>S: SYN β "I want to connect"
S-->>B: SYN-ACK β "Acknowledged, let's go"
B->>S: ACK β "Connection established"
Note over B,S: Step 3: TLS Handshake (HTTPS)<br/>TLS 1.3: 1 RTT Β· TLS 1.2: 2 RTTs
B->>S: ClientHello + supported ciphers
S-->>B: ServerHello + certificate
B->>S: Key exchange Β· encrypted channel ready
Note over B,S: Step 4: HTTP Request/Response
B->>S: GET / HTTP/1.1<br/>Host: example.com
activate S
S-->>B: HTTP 200 OK<br/>Content-Type: text/html<br/>[page content]
deactivate S
Note over B,S: Step 5: TCP Teardown
B->>S: FIN β "I'm done"
S-->>B: ACK + FIN
B->>S: ACK β "Connection closed"
Here's what happened across those layers:
- DNS resolution (Application Layer) β Your browser translates
example.cominto an IP address like93.184.216.34. This lookup usually takes 1β50ms depending on caching. - TCP handshake (Transport Layer) β A three-way handshake (
SYN β SYN-ACK β ACK) establishes a reliable, ordered byte stream. One round trip of latency before any data flows. - TLS handshake (Transport/Application) β For HTTPS, another 1β2 round trips to negotiate encryption. TLS 1.3 reduces this to one round trip; 0-RTT resumption eliminates it for returning visitors.
- HTTP request/response (Application Layer) β Your browser sends
GET / HTTP/1.1with headers; the server returns200 OKwith the page content. - TCP teardown β A four-way handshake (
FIN β ACK β FIN β ACK) closes the connection cleanly.
// The entire sequence above in a single line of application code:
const response = await fetch('https://example.com');
// Underneath: DNS lookup + TCP connect + TLS negotiate + HTTP transfer + TCP close
// Total latency: DNS (1-50ms) + TCP RTT (1-100ms) + TLS (1-100ms) + server processing
The key observation: one conceptual "request" involves many round trips at lower layers. The higher you go in the stack, the more convenient the abstraction β but also the more latency you're paying. This tension between convenience and performance surfaces in every protocol decision you'll make.
Why this matters for your design
Without HTTP keep-alive or HTTP/2 multiplexing, every single request repeats the TCP and TLS handshakes. For a webpage that loads 50 assets, that's 50 Γ (TCP + TLS) = potentially seconds of overhead. This is why connection reuse is the single most impactful HTTP optimization β and why HTTP/2 multiplexing was invented.
Every protocol decision you make in an interview carries this overhead. The question isn't just "what data do I send?" β it's "how many round trips does it cost and can I afford them?"
Key Components
| Component | Role |
|---|---|
| DNS | Translates domain names to IP addresses; first step of every request |
| TCP | Reliable, ordered byte stream β default transport for all web traffic |
| UDP | Best-effort, connectionless transport β used when speed beats reliability |
| HTTP/HTTPS | Stateless request-response protocol β the foundation of web APIs |
| Load Balancer | Distributes traffic across servers; detects and routes around failures |
| TLS | Encrypts data in transit; mandatory for any production system |
| WebSocket | Persistent bidirectional channel for real-time communication |
| gRPC | Binary RPC framework for high-performance internal service communication |
The Networking Stack
While the full OSI model has 7 layers, only three consistently appear in system design interviews. Let's go through each one and understand what it gives us as application developers.
Network Layer β IP
The Internet Protocol (IP) handles two things: addressing (where is the destination?) and routing (how do packets get there?). Every machine on a network gets an IP address β either assigned by DHCP when it boots or configured statically.
Public IPs are routable across the internet. The backbone infrastructure knows that addresses starting with 17.x.x.x belong to Apple, and routes packets accordingly. Private IPs (like 10.0.0.x or 192.168.x.x) only work within a local network and require Network Address Translation (NAT) to reach the public internet.
For your interview: IP is plumbing. You almost never need to discuss it explicitly. The one exception is when you're designing for multi-region deployments β then you'll need to talk about IP-based routing, Anycast (multiple servers sharing one IP for geographic routing), and how DNS maps domain names to different IPs in different regions.
Transport Layer β TCP, UDP, and QUIC
This is where things get interesting for system design. The transport layer determines the reliability and performance characteristics of your communication.
The next section breaks down each protocol in detail, but here's the one-liner: TCP guarantees delivery and ordering at the cost of latency. UDP sacrifices both for speed. QUIC gives you TCP's reliability with UDP's performance β but it's still gaining adoption.
Application Layer β Where You Live
Everything above the transport layer is the application layer β HTTP, WebSockets, gRPC, DNS, and every custom protocol you might design. This is where 90% of your interview decisions happen.
The application layer runs in user space, meaning you control it entirely. Transport and below run in the kernel β fast, but inflexible. This distinction matters: changing your HTTP serialization format is a deploy, but changing TCP congestion control requires a kernel update across your fleet.
Most of your design decisions live at the application layer. The transport and network layers are infrastructure choices you make once and rarely revisit. For your interview: spend your time on application-layer decisions β that's where you have control and where interviewers expect depth.
Transport Protocols Deep Dive
For most system design interviews, the real choice is between TCP and UDP. QUIC is increasingly relevant but still supplementary knowledge. Let me walk through each one.
TCP β The Reliable Workhorse
Transmission Control Protocol (TCP) is a connection-oriented, reliable, ordered byte stream protocol. It guarantees that data arrives in the order it was sent, retransmitting anything lost along the way.
The connection is called a "stream" β a stateful, ordered channel between client and server. Two messages sent on the same stream arrive in the same order. TCP handles acknowledgement, retransmission, flow control (don't overwhelm the receiver), and congestion control (don't overwhelm the network).
Key characteristics:
- Connection-oriented: Three-way handshake before data flows
- Reliable delivery: Every byte acknowledged; lost packets retransmitted
- Ordering guaranteed: Bytes arrive in the order sent
- Flow control: Receiver advertises how much data it can handle
- Congestion control: Sender adapts rate to avoid network overload
TCP is the default for almost everything. If you're not sure which transport protocol to use, use TCP. Interviewers expect it as the baseline and won't ask you to justify it.
UDP β Speed Over Safety
User Datagram Protocol (UDP) is a connectionless, best-effort protocol. No handshake, no acknowledgements, no ordering. You fire packets into the void and hope they arrive.
What you get for that lack of guarantees is speed. UDP adds only 8 bytes of header (vs TCP's 20β60 bytes) and has zero connection setup overhead. The first byte of real data can be on the wire immediately.
Key characteristics:
- Connectionless: No handshake, no state, no teardown
- Best-effort delivery: Packets can be lost, duplicated, or reordered
- No flow/congestion control: Sender can blast at any rate
- Minimal overhead: 8-byte header, no ACK traffic
So why would anyone use a protocol that doesn't guarantee delivery? Because for some applications, getting data fast is more important than getting every packet.
When UDP wins:
- Live video/audio streaming β a dropped frame is invisible; a retransmitted frame arrives too late to display
- Online gaming β knowing where a player was 200ms ago is useless; you want their position now
- DNS lookups β small, stateless queries where retrying from scratch is faster than TCP handshake + retry
- Telemetry/metrics collection β losing 0.1% of data points doesn't affect aggregates
The browser problem with UDP
Browsers don't natively support UDP sockets. The only way to send UDP from a browser is through WebRTC (covered below). If your design needs UDP-like speed for browser clients, you'll need WebRTC for real-time media or fall back to HTTP/WebSocket for everything else. App-native clients (iOS/Android) can use UDP directly.
My recommendation: default to TCP in interviews. When you reach for UDP, you should be able to say exactly why packet loss is acceptable in your use case. If you can't articulate that in one sentence, stick with TCP.
QUIC β The Modern Compromise
QUIC is a transport protocol built by Google on top of UDP that provides TCP-like reliability with significant performance improvements. HTTP/3 runs on QUIC. It's gaining adoption rapidly β Chrome, Firefox, and Safari all support it, and Cloudflare and Google serve significant traffic over QUIC.
What QUIC fixes:
- Zero-RTT connection establishment β Returning clients can send data immediately, no handshake wait
- No head-of-line blocking β In TCP, one lost packet blocks all streams. QUIC multiplexes independent streams, so a lost packet only stalls the affected stream
- Built-in encryption β TLS 1.3 is mandatory and integrated into the handshake, reducing total round trips
- Connection migration β When your phone switches from Wi-Fi to cellular, the connection survives because QUIC identifies connections by ID, not by IP:port
For interviews, think of QUIC as "better TCP." Mention it when discussing mobile-first designs or global services β your interviewer will be impressed. But don't build your entire design around it; TCP is the safe, universal default.
Choosing Your Transport Protocol
Here's the honest decision framework. Most of the time this decision is obvious β the hard part is knowing the exceptions.
| Scenario | Protocol | Why |
|---|---|---|
| Web APIs, database connections, file transfers | TCP | Data integrity is non-negotiable |
| Live video/audio streaming | UDP | Late data is useless; drop it |
| Online gaming (position updates) | UDP | Stale positions are worse than missing ones |
| DNS lookups | UDP | Tiny stateless queries; retry is cheaper than handshake |
| Mobile-first HTTP services | QUIC | Connection migration + reduced HOL blocking |
| IoT telemetry (high volume, lossy OK) | UDP | Losing 0.1% of sensor readings is fine |
| Internal microservice communication | TCP (or QUIC) | Reliability between services is table stakes |
The bottom line: TCP until you have a specific reason for UDP. QUIC if you want bonus points and your clients support it.
Application Layer Protocols
The application layer is where most of your interview design decisions live. These protocols define how your services exchange data β and each one carries its own set of trade-offs around performance, flexibility, and complexity.
HTTP/HTTPS β The Web's Foundation
Hypertext Transfer Protocol (HTTP) is a stateless, request-response protocol. The client sends a request, the server sends a response, and neither remembers the other. Every web page, every API call, every image download β HTTP.
HTTP is stateless by design β and that's a feature, not a limitation. Stateless services are dramatically easier to scale: any server can handle any request because no server needs to remember previous interactions. Move session state to Redis, auth tokens to JWTs, and keep your HTTP servers as pure functions of (request β response).
Key concepts you should know:
| Concept | Examples | Interview relevance |
|---|---|---|
| Request methods | GET, POST, PUT, PATCH, DELETE | Demonstrates REST understanding |
| Status codes | 200, 201, 301, 400, 401, 403, 404, 429, 500, 502, 503 | Error handling design decisions |
| Headers | Content-Type, Authorization, Cache-Control, Accept-Encoding | Caching, auth, content negotiation |
| Body | JSON, protobuf, form data, multipart | Serialization format choice |
HTTPS wraps HTTP in TLS encryption. For any production system, HTTPS is non-negotiable β your interviewer assumes it. Don't burn interview time explaining that you'll use HTTPS; just use it.
HTTP is the foundation everything else builds on. Know it cold β not because it's exciting, but because every other protocol is defined in contrast to it.
REST β Resource-Oriented APIs
REST (Representational State Transfer) is not a protocol β it's a convention for using HTTP to build APIs. The core idea: model your system as resources (nouns) and use HTTP methods (verbs) to operate on them.
// A simple RESTful API for managing users
// GET /users β List all users
// POST /users β Create a new user
// GET /users/{id} β Get a specific user
// PUT /users/{id} β Update a user (full replacement)
// PATCH /users/{id} β Update specific fields
// DELETE /users/{id} β Delete a user
// Nested resources express relationships:
// GET /users/{id}/posts β List a user's posts
// POST /users/{id}/posts β Create a post for a user
// GET /users/{id}/posts/{pid} β Get a specific post
REST's superpower is simplicity. Your API endpoints map directly to your data model. If you've identified your core entities (User, Post, Order, Product), your REST API practically writes itself.
Common mistakes I see in interviews:
- Using verbs in URLs:
POST /createUserinstead ofPOST /users - Inconsistent naming: mixing
/users/{id}with/get-user?id=123 - Forgetting pagination:
GET /usersreturning 2 million rows - Not using proper status codes: returning
200with{ "error": "not found" }instead of404
For your interview: REST is the default β use it unless you have a specific reason for GraphQL or gRPC. Say "I'll design a RESTful API with these endpoints..." and list your core resources.
GraphQL β Flexible Data Fetching
GraphQL lets clients request exactly the data they need in a single query. Instead of the server deciding what each endpoint returns, the client specifies the shape of the response.
Under-fetching: Your mobile app needs a user's name, their latest post, and that post's comment count. With REST, that's three separate API calls: GET /users/{id}, GET /users/{id}/posts?limit=1, and GET /posts/{pid}/comments/count. Three round trips, three connections, maybe 300ms of mobile latency.
Over-fetching: Your REST endpoint GET /users/{id} returns 50 fields (name, email, avatar, bio, preferences, settings, ...). Your mobile app only needs 3 of them. You're transferring 10Γ more data than needed on a slow cellular connection.
GraphQL eliminates both: one query, one round trip, exactly the fields you need.
query {
user(id: "123") {
name
latestPost {
title
commentCount
}
}
}
Where GraphQL shines: Mobile apps with bandwidth constraints. Frontend teams that iterate rapidly and need different data for different views. Public APIs where you can't predict what clients will need (GitHub's API v4 is GraphQL).
Where GraphQL struggles: In system design interviews, the requirements are fixed β GraphQL's flexibility doesn't add much when you know exactly what every page needs. Additionally, GraphQL queries can be expensive to execute on the backend (a deeply nested query might trigger hundreds of database calls), creating performance challenges that REST's fixed endpoints avoid.
For your interview: mention GraphQL when the interviewer emphasizes mobile clients, rapidly changing UI requirements, or BFF (Backend-for-Frontend) patterns. Otherwise, default to REST.
gRPC β Binary Service Communication
gRPC is Google's RPC framework that uses HTTP/2 and Protocol Buffers (protobuf) for efficient, typed, binary communication between services.
Where REST sends human-readable JSON over HTTP/1.1, gRPC sends binary protobuf over HTTP/2. The result: 3β10Γ less bandwidth and significantly faster serialization/deserialization.
// Protocol Buffer definition β strongly typed schema
message User {
string id = 1;
string name = 2;
string email = 3;
}
service UserService {
rpc GetUser (GetUserRequest) returns (User);
rpc ListUsers (ListUsersRequest) returns (stream User); // Server streaming
}
Key advantages:
- Binary protocol: ~3β10Γ less bandwidth than JSON
- Typed schema: Compile-time type safety; breaking changes caught before deploy
- HTTP/2 native: Multiplexing, streaming, header compression built in
- Code generation: Client and server stubs auto-generated from
.protofiles - Streaming: Natively supports server streaming, client streaming, and bidirectional streaming
The limitation: Browsers can't natively call gRPC endpoints. There's gRPC-Web as a bridge, but it's clunky. That's why gRPC lives behind your API gateway, not in front of it.
I'll often see candidates propose gRPC for everything in interviews. Don't do this β it signals that you're optimizing prematurely. Mention gRPC for internal service communication when the interviewer asks about performance or when you have obvious high-throughput service-to-service calls (like a recommendation engine querying a feature store millions of times per second).
SSE β Server Push Events
Server-Sent Events (SSE) is a standard built on HTTP that allows a server to push many messages over a single connection. The client opens a regular HTTP connection, and the server keeps it open β streaming events as they occur.
// Server-side SSE (Node.js)
app.get('/events', (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Push events as they happen
const send = (data: object) => {
res.write(`id: ${Date.now()}\n`);
res.write(`data: ${JSON.stringify(data)}\n\n`);
};
// Example: push price updates every second
const interval = setInterval(() => {
send({ price: Math.random() * 100, timestamp: Date.now() });
}, 1000);
req.on('close', () => clearInterval(interval));
});
SSE characteristics:
- Unidirectional: Server β client only. Client can't send messages back over the SSE connection (use a separate HTTP POST for that).
- Auto-reconnect: The EventSource API automatically reconnects with the last event ID, and the server resumes from where the client dropped off.
- Text-only: Messages are UTF-8 text (usually JSON). No binary data.
- HTTP-native: Works through all existing HTTP infrastructure β proxies, CDNs, load balancers.
Where SSE wins: Live dashboards, auction price updates, notification feeds, stock tickers. Anywhere the server needs to push updates and the client just listens. SSE is dramatically simpler to implement and operate than WebSockets for push-only use cases.
Interview tip: SSE before WebSocket
If you need server-to-client push but NOT client-to-server push, use SSE. It's simpler, works through all HTTP infrastructure, and auto-reconnects. Reaching for WebSockets when SSE would suffice signals that you're over-engineering. Say "I'd use SSE here because we only need server push β WebSockets adds bidirectional overhead we don't need."
SSE is the right tool when the server needs to push events and the client just listens β simpler, cheaper, and more infrastructure-compatible than WebSockets.
WebSockets β Bidirectional Channels
WebSockets provide a persistent, full-duplex communication channel between client and server. Unlike HTTP's request-response pattern, either side can send messages at any time without waiting for the other.
// Client-side WebSocket connection
const ws = new WebSocket('wss://chat.example.com/rooms/general');
ws.onopen = () => {
// Connection established β send messages freely
ws.send(JSON.stringify({ type: 'join', user: 'alice' }));
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
// Process incoming messages from server or other clients
renderMessage(message);
};
ws.onclose = () => {
// Connection dropped β implement reconnection logic
setTimeout(() => reconnect(), 1000 + Math.random() * 2000);
};
When WebSockets are essential:
- Chat applications β Messages must flow both directions instantly
- Collaborative editing β User keystrokes and cursor positions push to server, server pushes to all other editors
- Real-time gaming β Player input goes to server, game state comes back, continuously
- Trading platforms β Orders go to server, price updates and fills come back
The cost of WebSockets:
- Every connection is stateful β the server must track each connected client in memory
- L4 load balancer required β L7 load balancers terminate HTTP connections, which breaks WebSocket passthrough (unless explicitly configured for WebSocket support)
- Connection management β Reconnection, heartbeats, authentication on reconnect all become your problem
- Horizontal scaling complexity β If User A is connected to Server 1 and User B to Server 2, broadcasting a message requires inter-server communication (usually via Redis pub/sub or a message queue)
Don't reach for WebSockets unless you genuinely need bidirectional, high-frequency communication. They're powerful but expensive to operate at scale β and announcing WebSockets without justification is a red flag in interviews.
WebRTC β Peer-to-Peer Communication
WebRTC enables direct peer-to-peer communication between browsers β no intermediary server for the data exchange. It's the only application-layer protocol here that uses UDP under the hood, making it ideal for real-time audio and video.
graph LR
%% Styling Definitions
classDef signal fill:#f3e5f5,stroke:#8e24aa,stroke-width:2px,color:#000
classDef stun fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000
classDef turn fill:#ffebee,stroke:#c62828,stroke-width:2px,color:#000
classDef peer fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
classDef media fill:#fff8e1,stroke:#ff8f00,stroke-width:3px,color:#000
%% Peers on the Extremes
A["π» Peer A<br/>(Caller)"]:::peer
B["π± Peer B<br/>(Callee)"]:::peer
%% Servers in the Middle
STUN{{"π STUN Server<br/>(IP Discovery)"}}:::stun
Sig{{"π‘ Signaling Server<br/>(Message Router)"}}:::signal
TURN{{"π TURN Server<br/>(Media Relay)"}}:::turn
%% Phase 2: STUN (Placed at top for visual balance)
A <-.->|"Step 5. Query Public IP"| STUN
STUN <-.->|"Step 5. Query Public IP"| B
%% Phase 1: Signaling (SDP & ICE Routing)
A -->|"Step 1. SDP Offer & ICE"| Sig
Sig -->|"Step 2. Forward to B"| B
B -->|"Step 3. SDP Answer & ICE"| Sig
Sig -->|"Step 4. Forward to A"| A
%% Phase 4: Fallback Scenario (TURN Relay)
A -.->|"Step 6b. Relay Media (Fallback)"| TURN
TURN -.->|"Step 6b. Relay Media to B"| B
%% Phase 3: Ideal Scenario (Direct Media - Thick Line)
A <===>|"Step 6a. DIRECT MEDIA STREAM<br/>(UDP/SRTP)"| B:::media
How WebRTC connects peers:
- Signaling β Both clients connect to a central signaling server (your server, using WebSockets or HTTP) to learn about each other and exchange connection metadata
- STUN β Each client contacts a STUN server to discover its own public IP and port (most clients are behind NAT)
- ICE candidates β Clients exchange their discovered addresses via the signaling server
- Connection attempt β Clients try to connect directly using the exchanged addresses
- TURN fallback β If direct connection fails (restrictive firewalls, symmetric NAT), traffic relays through a TURN server
The reality: About 85% of WebRTC connections succeed via STUN (direct P2P). The remaining 15% require TURN relay. TURN servers carry the actual media traffic, so they're expensive to operate β this is the main operational cost of WebRTC.
For interviews: Use WebRTC only for audio/video calling and conferencing β it's complex, brittle, and overkill for anything else. For 1:1 video calls, WebRTC is perfect. For a 100-person video call, you'll need a Selective Forwarding Unit (SFU) β a server that receives all streams and selectively forwards them, partially defeating the peer-to-peer advantage.
WebRTC is an absolute pain to get right, and even the best implementations still suffer connection losses. Stick to video and audio β anything else is a trap.
Choosing the Right Application Protocol
This is where interviews separate candidates who memorize from candidates who reason. There's no single right answer β it depends on your requirements. Here are the decision frameworks.
| Requirement | Protocol | Why not the alternatives |
|---|---|---|
| Standard CRUD API | REST | GraphQL adds complexity with no benefit; gRPC lacks browser support |
| Mobile app with varying data needs | GraphQL | REST over/under-fetches; gRPC doesn't support flexible queries |
| Internal service at 100K+ RPS | gRPC | REST's JSON serialization becomes a bottleneck; GraphQL adds execution overhead |
| Live dashboard, auction updates | SSE | WebSockets adds bidirectional overhead you don't need |
| Chat, collaborative editing | WebSockets | SSE is unidirectional; gRPC not supported in browsers |
| Video/audio calling | WebRTC | WebSockets adds server in the data path; SSE is unidirectional |
| Notification feed for mobile | SSE (or push notifications) | WebSockets battery drain; WebRTC absurd for text |
The golden rule: use the simplest protocol that meets your requirements. Every step up the complexity ladder (REST β SSE β WebSocket β WebRTC) adds infrastructure cost, operational burden, and failure modes.
Load Balancing
You've picked your protocols and your services are talking to each other. Now what happens when one server can't handle the load? You scale horizontally β add more servers.
More servers means you need a way to distribute traffic across them.
Client-Side Load Balancing
The client itself decides which server to talk to. It periodically fetches a list of available servers from a service registry and selects one using a local algorithm (round-robin, random, least connections).
How it works:
- Client queries a service registry (e.g., Consul, etcd, or a DNS SRV record) for available servers
- Client receives a list of server addresses
- Client picks one using its own load-balancing algorithm
- Client sends the request directly β no intermediary hop
Where you'll see this:
- gRPC has built-in client-side load balancing β the client resolves a DNS name to multiple IPs and balances across them
- Redis Cluster clients discover all nodes and route
GET/SETcommands to the correct shard directly - DNS round-robin rotates IP addresses so each client gets a different "first" server
The advantage: No extra network hop. The client talks directly to the server. This eliminates the load balancer as both a latency source and a single point of failure.
The disadvantage: Every client must implement the balancing logic. Clients may hold stale server lists. Rolling out a new balancing algorithm means updating every client.
For your interview: mention client-side load balancing for internal microservice communication (especially with gRPC). For external traffic, you'll want a dedicated load balancer.
Dedicated Load Balancers
A dedicated load balancer sits between clients and servers, receiving all incoming traffic and distributing it across backend servers. The client sees only the load balancer's address β it has no idea how many servers exist behind it.
The key decision with dedicated load balancers is which OSI layer they operate at β Layer 4 or Layer 7.
Layer 4 Load Balancers
L4 load balancers operate at the transport layer. They route based on IP address and port β they don't inspect the actual content of the packets. Think of them as a transparent pipe: they establish a TCP connection from the client, pick a backend server, and forward all packets from that connection to that server.
L4 characteristics:
- Forwards entire TCP/UDP connections, not individual requests
- Cannot read HTTP headers, cookies, or URL paths
- Very fast β minimal packet inspection
- Preserves client-to-server TCP connection (important for WebSockets)
- Can handle any protocol, not just HTTP
Layer 7 Load Balancers
L7 load balancers operate at the application layer. They read and understand HTTP requests β inspecting headers, URLs, cookies, and request bodies. They terminate the client's TCP/TLS connection and create new connections to backend servers.
L7 characteristics:
- Routes based on URL path, headers, cookies, query parameters
- Terminates and re-establishes TCP/TLS connections
- Can modify requests and responses (add headers, rewrite URLs)
- Handles TLS termination centrally (offloads crypto from backend servers)
- More CPU-intensive than L4 due to packet inspection
- Better for HTTP traffic; supports sticky sessions via cookies
The critical difference: With an L4 load balancer, the client has a TCP connection that passes through the load balancer to a specific server. With an L7 load balancer, the client has a TCP connection to the load balancer, and the load balancer has separate TCP connections to backend servers. This is why L7 load balancers can't natively support WebSockets without explicit configuration β the "upgrade" request terminates at the load balancer.
Interview tip: L4 for WebSockets, L7 for everything else
When you add a load balancer in an interview, specify the layer. "I'd use an L7 load balancer here for HTTP routing and TLS termination" shows you know the trade-off. If you later add WebSockets, say "we'll need L4 load balancing for the WebSocket connections to maintain persistent TCP connections to the backend." That single distinction separates generic answers from precise ones.
Health Checks and Fault Tolerance
Load balancers don't just distribute traffic β they detect and route around failures. If a backend server crashes, the load balancer stops sending traffic to it within seconds.
How health checks work:
- TCP health check: Attempts a TCP connection to the server. If the connection succeeds, the server is healthy. Fast and low overhead.
- HTTP health check: Sends an HTTP request (usually
GET /health) and checks for a 200 response. Can verify that the application is actually working, not just that the port is open. - Custom health checks: Call a specific endpoint that verifies database connectivity, downstream dependencies, and other application health.
// A proper health check endpoint β not just "return 200"
app.get('/health', async (req, res) => {
try {
// Check database connectivity
await db.query('SELECT 1');
// Check Redis connectivity
await cache.ping();
// Check critical downstream service
const downstream = await fetch('http://payment-service/health', {
signal: AbortSignal.timeout(2000) // 2s timeout
});
if (!downstream.ok) throw new Error('Payment service unhealthy');
res.status(200).json({ status: 'healthy' });
} catch (error) {
res.status(503).json({ status: 'unhealthy', error: error.message });
}
});
Health checks are the mechanism that makes load balancers reliable. Without them, you're routing traffic to dead servers.
Load Balancing Algorithms
The algorithm determines how the load balancer picks which server handles each request.
| Algorithm | How it works | Best for |
|---|---|---|
| Round Robin | Sequential rotation: server 1, 2, 3, 1, 2, 3... | Stateless services with uniform request cost |
| Random | Random server selection | Similar to round robin; simpler implementation |
| Least Connections | Routes to server with fewest active connections | WebSocket/SSE services with persistent connections |
| Least Response Time | Routes to the fastest-responding server | When some servers are faster (different hardware, proximity) |
| IP Hash | Hash of client IP determines server | Session persistence without cookies |
| Consistent Hashing | Hash ring maps requests to servers | Cache-aware routing; minimizes reshuffling when servers change |
For most stateless HTTP services, round robin or random is correct. The request workload is roughly uniform and any server can handle any request.
For services with persistent connections (WebSocket, SSE), use least connections. Without it, new connections pile up on whatever server is "next" in the round-robin rotation while servers with many lingering connections continue accumulating traffic.
Pick the algorithm that matches your workload pattern, and you'll rarely need to think about it again.
Handling Network Failures
Distributed systems fail β not "might fail," they will fail. The question isn't whether your network calls will break, but whether your system degrades gracefully when they do. This is where senior candidates separate themselves.
Timeouts and Retries with Backoff
The most fundamental reliability pattern: set a timeout on every network call and retry on failure.
Timeouts prevent one slow service from consuming all your resources. Without a timeout, a stalled downstream service can hold your threads/connections indefinitely until your entire thread pool is exhausted. Set a timeout on every outbound network call β every single one.
Retries handle transient failures β brief network hiccups, momentary server overloads, temporary DNS resolution failures. Most transient issues resolve within seconds.
Backoff prevents retries from hammering a recovering service. Instead of retrying immediately, you wait β and each subsequent retry waits exponentially longer.
async function fetchWithRetry<T>(
url: string,
options: { maxRetries?: number; baseDelayMs?: number; timeoutMs?: number } = {}
): Promise<T> {
const { maxRetries = 3, baseDelayMs = 200, timeoutMs = 5000 } = options;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const response = await fetch(url, {
signal: AbortSignal.timeout(timeoutMs),
});
if (response.ok) return response.json() as T;
if (response.status < 500) throw new Error(`Client error: ${response.status}`);
// 5xx β retry
} catch (error) {
if (attempt === maxRetries) throw error;
}
// Exponential backoff with jitter: 200ms, 400ms, 800ms (Β±random)
const delay = baseDelayMs * Math.pow(2, attempt);
const jitter = delay * 0.5 * Math.random(); // 0-50% jitter
await new Promise(resolve => setTimeout(resolve, delay + jitter));
}
throw new Error('Unreachable');
}
Jitter is not optional
Without jitter, all clients retry at exactly the same intervals β creating synchronized waves of retries that repeatedly spike load on a struggling service. Jitter randomizes the retry timing so retries spread across a window instead of firing simultaneously. It's the difference between a recovery and a sustained outage. Always add jitter to your backoff.
The magic phrase interviewers look for: "Retry with exponential backoff and jitter." That sentence signals you understand both the retry mechanism and the failure mode it can create.
Idempotency
Retries are dangerous when operations have side effects. If you retry a payment request, you might charge the customer twice. Idempotency means that performing the same operation multiple times produces the same result as performing it once.
How to make writes idempotent:
- Assign each operation a unique idempotency key (UUID, or a natural key like
user_id + timestamp) - Before processing, check if that key has already been processed
- If already processed, return the stored result without re-executing
- If new, process the operation and store the result keyed by the idempotency key
async function processPayment(req: PaymentRequest): Promise<PaymentResult> {
const idempotencyKey = req.headers['Idempotency-Key'];
// Check if we've already processed this request
const existing = await db.query(
'SELECT result FROM processed_payments WHERE idempotency_key = $1',
[idempotencyKey]
);
if (existing) return existing.result; // Return stored result β no re-charge
// Process the payment
const result = await chargeCard(req.body.amount, req.body.cardToken);
// Store result for future duplicate detection
await db.query(
'INSERT INTO processed_payments (idempotency_key, result) VALUES ($1, $2)',
[idempotencyKey, JSON.stringify(result)]
);
return result;
}
HTTP GET, PUT, and DELETE are inherently idempotent. POST requires explicit idempotency implementation. In interviews, whenever you propose retries for write operations, immediately follow with "using idempotency keys to prevent duplicate processing."
Circuit Breakers
When a downstream service is consistently failing, retries make things worse β you're adding load to a system that's already drowning. Circuit breakers detect sustained failures and stop calling the failing service entirely, giving it time to recover.
stateDiagram-v2
[*] --> Closed
Closed --> Open : Failure count<br/>exceeds threshold
Open --> HalfOpen : Timeout expires<br/>(cool-down period)
HalfOpen --> Closed : Test request<br/>succeeds
HalfOpen --> Open : Test request fails
note right of Closed : Normal operation
note right of Open : Fail fast β no calls
note right of HalfOpen : Test with single request
The three states:
- Closed (normal) β Requests flow through normally. Failures are counted.
- Open (tripped) β All requests immediately fail without calling the downstream service. No load on the failing dependency. The circuit "trips" when failures exceed a threshold (e.g., 5 failures in 10 seconds).
- Half-Open (testing) β After a cool-down period, one test request is allowed through. If it succeeds, circuit closes. If it fails, circuit reopens.
The insight most candidates miss: circuit breakers are as much about protecting the failing service as they are about protecting your service. By stopping the flood of requests, you give the downstream service room to recover instead of pummeling it with retries it can't handle.
Interview tip: combine all three patterns
The strongest failure-handling answer combines all three: "Each service call has a 3-second timeout. On timeout, we retry with exponential backoff and jitter, up to 3 attempts, using idempotency keys to prevent duplicate processing. If a service fails consistently, a circuit breaker trips β failing fast for 30 seconds before testing again." That's four sentences, and it covers every failure pattern interviewers look for.
Regionalization and Latency
For global services, the physical distance between clients and servers determines your baseline latency. Light travels through fiber at about 200,000 km/s β a New York to London round trip (5,600 km) has a theoretical minimum of ~56ms, before any processing.
Content Delivery Networks (CDNs)
CDNs cache static and semi-static content at edge locations close to users β Cloudflare has 300+ cities, CloudFront has 600+ edge locations. When a user requests a cached asset, it's served from a server that might be 10ms away instead of 200ms away.
For interviews, reach for a CDN whenever you have static content (images, videos, JS bundles) or highly cacheable API responses (search results, product catalogs). The CDN itself is covered in depth in the CDN article β here just know it's the networking solution to geographic latency for read-heavy data.
Regional Partitioning
When your data is inherently geographic (Uber rides are local, DoorDash orders are local), partition your services and data by region. Each region handles all reads and writes for its geography with co-located servers and databases.
Why this works: If Uber's Miami users only care about Miami drivers, put both the user data and driver data for Miami in a US-Southeast region. Every query stays within a 5ms network boundary instead of crossing 100ms to a central database.
The complication: Cross-region queries. What if a Miami user wants to see their ride history from when they visited New York? You need either cross-region reads (slow) or asynchronous replication of user data to every region (complex, potentially stale). Regional partitioning only works cleanly when the data access pattern is genuinely regional.
CDNs for static content, regional partitions for geographic data β those two tools handle 90% of global latency challenges.
Trade-offs
| Pros | Cons |
|---|---|
| Protocol selection directly optimizes for your workload (latency, throughput, reliability) | More protocol-aware architecture = more complexity and specialized knowledge required |
| Load balancers enable horizontal scaling and automatic failover | Load balancers are an additional hop, potential SPOF, and infrastructure cost |
| Retries and circuit breakers make the system resilient to transient failures | Retry storms can amplify failures; circuit breakers can mask underlying problems |
| Regional deployment reduces latency for global users | Cross-region replication adds consistency challenges and infrastructure cost |
| gRPC's binary protocol provides 3β10Γ efficiency for internal communication | Mixed protocol stack (REST external, gRPC internal) adds cognitive and operational overhead |
| WebSockets enable real-time bidirectional communication | Stateful connections complicate horizontal scaling and require specialized infrastructure |
The fundamental tension here is simplicity vs. performance/resilience β every networking sophistication (protocol optimization, smart load balancing, regional deployment) reduces latency or improves reliability but adds infrastructure you must understand, monitor, and maintain. The right answer is always the simplest architecture that meets your requirements β then optimize only the bottlenecks you can prove exist.
When to Use It / When to Avoid It
Ok, so the question isn't really "when to use networking" β you can't avoid it. The real question is when to invest in advanced networking patterns vs. keeping things simple.
Invest in networking sophistication when:
- Your services communicate at high frequency (>10K RPS between two services) β protocol choice matters
- You have real-time requirements (chat, live updates, gaming) β SSE/WebSocket/WebRTC is necessary
- Your users are globally distributed β CDN and regional deployment are essential
- Reliability is critical (payments, order processing) β retries, idempotency, and circuit breakers are table stakes
- Internal service mesh is complex (20+ microservices) β client-side load balancing and gRPC pay off
Keep it simple when:
- You have a monolith or 2β3 services β REST over HTTP is fine for everything
- Traffic is moderate (under 1K RPS) β default round-robin load balancing works
- Users are in one region β skip regional deployment complexity
- You're prototyping β don't design for global scale on day one
If your system design fits on a whiteboard with REST and a single load balancer, don't add complexity to impress your interviewer. Start simple, then optimize the bottleneck they ask about. Complexity you don't need is complexity that will break.
Real-World Examples
Discord β 100M+ Active Users on WebSockets and Elixir
Discord serves over 100 million monthly active users with real-time messaging, voice, and video. Every connected client maintains a WebSocket connection for instant message delivery. Their gateway servers (written in Elixir) each handle ~1 million concurrent WebSocket connections.
The non-obvious lesson: Discord's biggest scaling challenge wasn't the WebSocket connections themselves β it was fan-out. A message sent to a server with 500,000 members must be delivered to 500,000 WebSocket connections, possibly spread across hundreds of gateway servers. They solved this with a custom pub/sub layer that routes messages only to the gateway servers with relevant connected clients β not all gateway servers.
Cloudflare β Anycast DNS at 50M+ Requests/Second
Cloudflare's DNS resolver (1.1.1.1) handles over 50 million queries per second across 300+ cities worldwide. They use Anycast β the same IP address is advertised from every data center, and BGP routing directs each query to the nearest one.
The non-obvious lesson: Even with Anycast, Cloudflare still has to deal with asymmetric routing. A client might send a DNS query to one data center but receive the response from a different one (due to BGP route changes). Their stateless UDP design makes this work β every server can independently answer any query without knowing about the others.
Stripe β Idempotency at Financial Scale
Stripe processes hundreds of billions of dollars in payments annually. Every mutation API supports idempotency keys, allowing clients to safely retry failed payment requests without double-charging. The idempotency system stores the complete response of the first successful request and returns it verbatim on subsequent retries β ensuring byte-identical responses regardless of how many times a request is retried.
The non-obvious lesson: Stripe's idempotency keys expire after 24 hours. They found that keys lasting longer than that caused more confusion than benefit β stale keys preventing legitimate new operations. The 24-hour window is long enough to handle any retry scenario but short enough to prevent accidental deduplication of genuinely new requests.
How This Shows Up in Interviews
Networking knowledge rarely gets its own dedicated question β it surfaces as a cross-cutting concern across every system design problem. When you draw an arrow between two services, your interviewer may probe on what protocol that arrow uses, how it handles failure, and how it scales.
When to bring it up proactively:
- When you draw your first service-to-service arrow β name the protocol (REST, gRPC)
- When you need real-time updates β justify your choice of SSE vs WebSocket vs polling
- When discussing scalability β load balancer type (L4/L7) and algorithm
- When an interviewer asks "what happens if this service goes down" β timeouts, retries, circuit breakers
Depth expected by level:
- Mid-level: Know TCP vs UDP, REST APIs, and basic load balancing. Can explain why you'd use a load balancer.
- Senior: Can justify protocol choices (REST vs gRPC), explain L4 vs L7 trade-offs, and design retry strategies with idempotency. Understands WebSocket scaling challenges.
- Staff: Can discuss QUIC, connection pooling, regional partitioning, circuit breaker state machines, and DNS load balancing failure modes. Understands the full stack implications of protocol choices.
Interview tip: name the protocol on every arrow
When drawing your architecture diagram, label each arrow with the protocol. "REST over HTTPS" between client and API gateway. "gRPC" between internal services. "WebSocket" for real-time connections. This takes 10 seconds and signals protocol awareness that most candidates skip.
| Interviewer asks | Strong answer |
|---|---|
| "How do services communicate here?" | "REST over HTTPS for the public API, gRPC for internal service calls where we need low latency and type safety." |
| "What happens if the payment service goes down?" | "3-second timeout, retry with exponential backoff and jitter up to 3 times, idempotency key on every payment request. If failures persist, circuit breaker trips β we return a 'payment pending' status and process via async queue." |
| "Why WebSockets and not SSE?" | "Because our chat feature requires bidirectional messaging β the client needs to send messages and receive them in real-time. SSE only supports server push." |
| "How do you handle global users?" | "CDN for static assets, regional API servers with co-located databases, and DNS-based routing to the nearest region. Cross-region reads fall back to the primary with ~100ms added latency." |
| "Why not use gRPC for the public API?" | "Browsers can't natively call gRPC. We'd need gRPC-Web as a bridge, adding complexity. REST is universally supported and the serialization overhead isn't our bottleneck β database queries are." |
Quick Recap
- Networking determines how your distributed services communicate β every arrow on your whiteboard is a protocol choice with latency, reliability, and scalability implications.
- TCP is the default transport protocol β use UDP only when packet loss is acceptable and latency is critical (streaming, gaming). QUIC is TCP's modern upgrade for mobile-first services.
- REST is the default API protocol β reach for GraphQL when client flexibility matters, gRPC when internal service throughput is critical, SSE for server push, and WebSockets only when you need bidirectional real-time communication.
- Load balancers distribute traffic and detect failures β L7 for HTTP (most common), L4 for WebSockets, client-side for internal microservices. Always specify the layer in your interview.
- Networks fail, and your design must handle it β timeouts on every call, retries with exponential backoff and jitter, idempotency keys for write operations, and circuit breakers for cascading failure prevention.
- Geographic latency is bounded by physics β CDNs for static content, regional partitioning for geographic data, and DNS-based routing for directing users to the nearest region.
- The strongest interview signal is naming the protocol on every arrow β "REST over HTTPS for the public API, gRPC for internal calls, WebSocket for real-time features" shows deliberate engineering, not handwaving.
Related Concepts
- Load Balancing β Deep dive into load balancing algorithms, health checks, and scaling strategies. If your interview probes "how does the load balancer work," this article has the answers.
- Caching β The most effective way to reduce network round trips. Understanding cache layers (browser, CDN, application) is the complement to understanding the network calls they eliminate.
- CDN β Detailed coverage of Content Delivery Networks, edge caching strategies, and cache invalidation. The networking layer's primary tool for reducing global latency.
- Scalability β Horizontal and vertical scaling patterns that networking enables. Load balancing, connection pooling, and regional deployment are the networking implementation of scalability principles.
- Microservices β Service-to-service communication is the core of microservices architecture. Protocol selection (REST vs gRPC), service discovery, and failure handling directly apply.