πŸ“HowToHLD
Vote for New Content
Vote for New Content
Home/High Level Design/Concepts

Networking

Master the networking protocols, load balancing strategies, and failure-handling patterns that underpin every system design interview β€” from TCP vs UDP to L4 vs L7 load balancers.

74 min read2026-03-25mediumnetworkingtcphttpload-balancingprotocolsdns

TL;DR

  • Networking is the connective tissue of every distributed system. Every service you draw on a whiteboard needs to talk to other services β€” and how they talk determines your latency, reliability, and scalability ceiling.
  • Three layers matter for interviews: IP handles addressing and routing, TCP/UDP handle reliable (or fast) delivery, and application protocols (HTTP, WebSockets, gRPC) define how your services exchange data.
  • TCP is your default. Use UDP only when you can tolerate packet loss and need minimal latency (live video, gaming). QUIC is the modern upgrade path β€” mention it to impress, default to TCP.
  • REST is your default API. Reach for GraphQL when flexible client queries matter, gRPC when internal service throughput is critical. For real-time push, SSE covers most use cases; WebSockets when you need bidirectional; WebRTC only for peer-to-peer audio/video.
  • Load balancers distribute traffic and detect failures. L7 for HTTP traffic, L4 for WebSocket or raw TCP. Client-side load balancing for internal microservices. Always mention health checks.
  • Networks fail. Retries with exponential backoff, idempotency keys, and circuit breakers are how you survive it β€” and every senior interviewer expects you to know them.

The Problem It Solves

Your microservices architecture looks perfect on the whiteboard. Five services, clean arrows, a database behind each one. You deploy to production and everything works β€” for about three hours.

Then the order service tries to call the payment service. The payment service is overloaded and takes 30 seconds to respond. The order service holds the connection open, eating one of its 200 threads.

More orders come in, each blocking on the payment service. Within four minutes, the order service has exhausted its thread pool β€” and now the API gateway can't reach the order service either. Users see 503s across the board.

Nothing crashed. No server died. The network between two services just got slow β€” and that slowness cascaded through your entire system because nobody thought about how services actually communicate.

Five microservices connected directly to each other with no load balancing, no timeouts, and no failure handling β€” one slow service cascades failures to all others.
Without networking awareness, one slow service can take down your entire architecture. The arrows between services aren't free β€” each one carries latency, failure modes, and capacity limits.

I see this pattern in almost every first-attempt system design. Engineers draw boxes and arrows but treat the arrows as magic β€” instant, reliable, free. They're not.

Every arrow is a network call with latency, a protocol with overhead, and a connection that can fail. Understanding networking transforms those arrows from handwaving into deliberate engineering decisions.

The network is not reliable β€” and your design must prove you know that

The eight fallacies of distributed computing start with "the network is reliable." In interviews, the difference between a mid-level and senior answer is whether you treat network calls as infallible or design around their failure. Every service-to-service arrow on your whiteboard needs a timeout, a retry strategy, and a plan for when it fails.

The arrows between your services carry your system's entire communication burden β€” getting them right is the difference between a resilient architecture and a house of cards.


What Is It?

Networking is how independent machines exchange data across physical and virtual connections. In system design, it's the set of protocols, patterns, and infrastructure that determine how your services discover each other, communicate, handle failures, and scale.

Analogy: Think of a large hospital. Doctors, nurses, pharmacists, and lab technicians are all specialists (services) who need to coordinate patient care. They don't all stand in the same room shouting β€” they use pagers (UDP), phone calls (TCP), written orders on clipboards (HTTP), and real-time intercoms during surgery (WebSockets).

The pager is fast but you might miss a page, while the phone call guarantees you reach someone but takes time to dial. The clipboard order creates a paper trail but is slow. During surgery, you need immediate two-way communication β€” nothing else will do.

Networking in system design is about choosing the right communication channel for each interaction, understanding the cost of each choice, and designing for what happens when the channel breaks.

Three-layer networking stack for system design showing Network Layer (IP), Transport Layer (TCP/UDP/QUIC), and Application Layer (HTTP/WebSocket/gRPC) with descriptions of each layer's responsibility.
You don't need all 7 OSI layers β€” these 3 cover 95% of what comes up in system design interviews. Each layer builds on the one below, adding abstractions that simplify the developer's job.

For your interview: know these three layers and what each one gives you. The network layer handles addressing (IP), the transport layer handles reliability (TCP) or speed (UDP), and the application layer handles your business logic protocol (HTTP, WebSockets, gRPC). Everything else is implementation detail you can skip unless asked.


How It Works

Let's trace a single web request end-to-end. When you type example.com into your browser, a carefully orchestrated sequence of protocol interactions unfolds across all three layers. Understanding this flow is the foundation for every networking decision in system design.

sequenceDiagram
    participant B as πŸ‘€ Browser
    participant DNS as 🌐 DNS Resolver
    participant S as βš™οΈ Web Server

    Note over B,S: Step 1: DNS Resolution
    B->>DNS: What IP is example.com?
    DNS-->>B: 93.184.216.34

    Note over B,S: Step 2: TCP 3-Way Handshake
    B->>S: SYN β€” "I want to connect"
    S-->>B: SYN-ACK β€” "Acknowledged, let's go"
    B->>S: ACK β€” "Connection established"

    Note over B,S: Step 3: TLS Handshake (HTTPS)<br/>TLS 1.3: 1 RTT Β· TLS 1.2: 2 RTTs
    B->>S: ClientHello + supported ciphers
    S-->>B: ServerHello + certificate
    B->>S: Key exchange Β· encrypted channel ready

    Note over B,S: Step 4: HTTP Request/Response
    B->>S: GET / HTTP/1.1<br/>Host: example.com
    activate S
    S-->>B: HTTP 200 OK<br/>Content-Type: text/html<br/>[page content]
    deactivate S

    Note over B,S: Step 5: TCP Teardown
    B->>S: FIN β€” "I'm done"
    S-->>B: ACK + FIN
    B->>S: ACK β€” "Connection closed"

Here's what happened across those layers:

  1. DNS resolution (Application Layer) β€” Your browser translates example.com into an IP address like 93.184.216.34. This lookup usually takes 1–50ms depending on caching.
  2. TCP handshake (Transport Layer) β€” A three-way handshake (SYN β†’ SYN-ACK β†’ ACK) establishes a reliable, ordered byte stream. One round trip of latency before any data flows.
  3. TLS handshake (Transport/Application) β€” For HTTPS, another 1–2 round trips to negotiate encryption. TLS 1.3 reduces this to one round trip; 0-RTT resumption eliminates it for returning visitors.
  4. HTTP request/response (Application Layer) β€” Your browser sends GET / HTTP/1.1 with headers; the server returns 200 OK with the page content.
  5. TCP teardown β€” A four-way handshake (FIN β†’ ACK β†’ FIN β†’ ACK) closes the connection cleanly.
// The entire sequence above in a single line of application code:
const response = await fetch('https://example.com');
// Underneath: DNS lookup + TCP connect + TLS negotiate + HTTP transfer + TCP close
// Total latency: DNS (1-50ms) + TCP RTT (1-100ms) + TLS (1-100ms) + server processing

The key observation: one conceptual "request" involves many round trips at lower layers. The higher you go in the stack, the more convenient the abstraction β€” but also the more latency you're paying. This tension between convenience and performance surfaces in every protocol decision you'll make.

Why this matters for your design

Without HTTP keep-alive or HTTP/2 multiplexing, every single request repeats the TCP and TLS handshakes. For a webpage that loads 50 assets, that's 50 Γ— (TCP + TLS) = potentially seconds of overhead. This is why connection reuse is the single most impactful HTTP optimization β€” and why HTTP/2 multiplexing was invented.

Every protocol decision you make in an interview carries this overhead. The question isn't just "what data do I send?" β€” it's "how many round trips does it cost and can I afford them?"

Key Components

ComponentRole
DNSTranslates domain names to IP addresses; first step of every request
TCPReliable, ordered byte stream β€” default transport for all web traffic
UDPBest-effort, connectionless transport β€” used when speed beats reliability
HTTP/HTTPSStateless request-response protocol β€” the foundation of web APIs
Load BalancerDistributes traffic across servers; detects and routes around failures
TLSEncrypts data in transit; mandatory for any production system
WebSocketPersistent bidirectional channel for real-time communication
gRPCBinary RPC framework for high-performance internal service communication

The Networking Stack

While the full OSI model has 7 layers, only three consistently appear in system design interviews. Let's go through each one and understand what it gives us as application developers.

Network Layer β€” IP

The Internet Protocol (IP) handles two things: addressing (where is the destination?) and routing (how do packets get there?). Every machine on a network gets an IP address β€” either assigned by DHCP when it boots or configured statically.

Public IPs are routable across the internet. The backbone infrastructure knows that addresses starting with 17.x.x.x belong to Apple, and routes packets accordingly. Private IPs (like 10.0.0.x or 192.168.x.x) only work within a local network and require Network Address Translation (NAT) to reach the public internet.

For your interview: IP is plumbing. You almost never need to discuss it explicitly. The one exception is when you're designing for multi-region deployments β€” then you'll need to talk about IP-based routing, Anycast (multiple servers sharing one IP for geographic routing), and how DNS maps domain names to different IPs in different regions.

Transport Layer β€” TCP, UDP, and QUIC

This is where things get interesting for system design. The transport layer determines the reliability and performance characteristics of your communication.

The next section breaks down each protocol in detail, but here's the one-liner: TCP guarantees delivery and ordering at the cost of latency. UDP sacrifices both for speed. QUIC gives you TCP's reliability with UDP's performance β€” but it's still gaining adoption.

Application Layer β€” Where You Live

Everything above the transport layer is the application layer β€” HTTP, WebSockets, gRPC, DNS, and every custom protocol you might design. This is where 90% of your interview decisions happen.

The application layer runs in user space, meaning you control it entirely. Transport and below run in the kernel β€” fast, but inflexible. This distinction matters: changing your HTTP serialization format is a deploy, but changing TCP congestion control requires a kernel update across your fleet.

Most of your design decisions live at the application layer. The transport and network layers are infrastructure choices you make once and rarely revisit. For your interview: spend your time on application-layer decisions β€” that's where you have control and where interviewers expect depth.


Transport Protocols Deep Dive

For most system design interviews, the real choice is between TCP and UDP. QUIC is increasingly relevant but still supplementary knowledge. Let me walk through each one.

TCP β€” The Reliable Workhorse

Transmission Control Protocol (TCP) is a connection-oriented, reliable, ordered byte stream protocol. It guarantees that data arrives in the order it was sent, retransmitting anything lost along the way.

TCP three-way handshake showing SYN, SYN-ACK, ACK sequence with timing annotations, followed by data transfer and four-way FIN teardown.
The TCP handshake costs one full round trip before any data flows. For cross-continent connections (~100ms RTT), that's 100ms of pure overhead per new connection β€” which is why connection pooling and keep-alive exist.

The connection is called a "stream" β€” a stateful, ordered channel between client and server. Two messages sent on the same stream arrive in the same order. TCP handles acknowledgement, retransmission, flow control (don't overwhelm the receiver), and congestion control (don't overwhelm the network).

Key characteristics:

  • Connection-oriented: Three-way handshake before data flows
  • Reliable delivery: Every byte acknowledged; lost packets retransmitted
  • Ordering guaranteed: Bytes arrive in the order sent
  • Flow control: Receiver advertises how much data it can handle
  • Congestion control: Sender adapts rate to avoid network overload

TCP is the default for almost everything. If you're not sure which transport protocol to use, use TCP. Interviewers expect it as the baseline and won't ask you to justify it.

UDP β€” Speed Over Safety

User Datagram Protocol (UDP) is a connectionless, best-effort protocol. No handshake, no acknowledgements, no ordering. You fire packets into the void and hope they arrive.

What you get for that lack of guarantees is speed. UDP adds only 8 bytes of header (vs TCP's 20–60 bytes) and has zero connection setup overhead. The first byte of real data can be on the wire immediately.

Key characteristics:

  • Connectionless: No handshake, no state, no teardown
  • Best-effort delivery: Packets can be lost, duplicated, or reordered
  • No flow/congestion control: Sender can blast at any rate
  • Minimal overhead: 8-byte header, no ACK traffic

So why would anyone use a protocol that doesn't guarantee delivery? Because for some applications, getting data fast is more important than getting every packet.

Side-by-side comparison showing TCP with connection setup overhead, ordered delivery, and retransmission on one side, and UDP with immediate transmission, possible loss, and no ordering on the other.
TCP pays for reliability with latency overhead. UDP trades reliability for raw speed. 90% of the time you want TCP. The 10% where you want UDP β€” you'll know it immediately because 'losing a few packets is fine' is a requirement.

When UDP wins:

  • Live video/audio streaming β€” a dropped frame is invisible; a retransmitted frame arrives too late to display
  • Online gaming β€” knowing where a player was 200ms ago is useless; you want their position now
  • DNS lookups β€” small, stateless queries where retrying from scratch is faster than TCP handshake + retry
  • Telemetry/metrics collection β€” losing 0.1% of data points doesn't affect aggregates

The browser problem with UDP

Browsers don't natively support UDP sockets. The only way to send UDP from a browser is through WebRTC (covered below). If your design needs UDP-like speed for browser clients, you'll need WebRTC for real-time media or fall back to HTTP/WebSocket for everything else. App-native clients (iOS/Android) can use UDP directly.

My recommendation: default to TCP in interviews. When you reach for UDP, you should be able to say exactly why packet loss is acceptable in your use case. If you can't articulate that in one sentence, stick with TCP.

QUIC β€” The Modern Compromise

QUIC is a transport protocol built by Google on top of UDP that provides TCP-like reliability with significant performance improvements. HTTP/3 runs on QUIC. It's gaining adoption rapidly β€” Chrome, Firefox, and Safari all support it, and Cloudflare and Google serve significant traffic over QUIC.

What QUIC fixes:

  • Zero-RTT connection establishment β€” Returning clients can send data immediately, no handshake wait
  • No head-of-line blocking β€” In TCP, one lost packet blocks all streams. QUIC multiplexes independent streams, so a lost packet only stalls the affected stream
  • Built-in encryption β€” TLS 1.3 is mandatory and integrated into the handshake, reducing total round trips
  • Connection migration β€” When your phone switches from Wi-Fi to cellular, the connection survives because QUIC identifies connections by ID, not by IP:port

For interviews, think of QUIC as "better TCP." Mention it when discussing mobile-first designs or global services β€” your interviewer will be impressed. But don't build your entire design around it; TCP is the safe, universal default.

Choosing Your Transport Protocol

Here's the honest decision framework. Most of the time this decision is obvious β€” the hard part is knowing the exceptions.

Decision tree for choosing between TCP, UDP, and QUIC. Starts with 'Can you tolerate packet loss?' β€” No leads to TCP/QUIC, Yes leads to 'Is latency critical?' β€” Yes leads to UDP.
The transport protocol decision is usually straightforward: TCP by default, UDP when packet loss is acceptable and latency is critical, QUIC when you want TCP's guarantees with modern performance.
ScenarioProtocolWhy
Web APIs, database connections, file transfersTCPData integrity is non-negotiable
Live video/audio streamingUDPLate data is useless; drop it
Online gaming (position updates)UDPStale positions are worse than missing ones
DNS lookupsUDPTiny stateless queries; retry is cheaper than handshake
Mobile-first HTTP servicesQUICConnection migration + reduced HOL blocking
IoT telemetry (high volume, lossy OK)UDPLosing 0.1% of sensor readings is fine
Internal microservice communicationTCP (or QUIC)Reliability between services is table stakes

The bottom line: TCP until you have a specific reason for UDP. QUIC if you want bonus points and your clients support it.


Application Layer Protocols

The application layer is where most of your interview design decisions live. These protocols define how your services exchange data β€” and each one carries its own set of trade-offs around performance, flexibility, and complexity.

HTTP/HTTPS β€” The Web's Foundation

Hypertext Transfer Protocol (HTTP) is a stateless, request-response protocol. The client sends a request, the server sends a response, and neither remembers the other. Every web page, every API call, every image download β€” HTTP.

Anatomy of an HTTP request showing method, path, headers, and body on the request side, and status code, headers, and body on the response side.
HTTP's simplicity is its superpower. A request has a method, a path, headers, and optionally a body. A response has a status code, headers, and a body. That's it β€” everything else is layered on top.

HTTP is stateless by design β€” and that's a feature, not a limitation. Stateless services are dramatically easier to scale: any server can handle any request because no server needs to remember previous interactions. Move session state to Redis, auth tokens to JWTs, and keep your HTTP servers as pure functions of (request β†’ response).

Key concepts you should know:

ConceptExamplesInterview relevance
Request methodsGET, POST, PUT, PATCH, DELETEDemonstrates REST understanding
Status codes200, 201, 301, 400, 401, 403, 404, 429, 500, 502, 503Error handling design decisions
HeadersContent-Type, Authorization, Cache-Control, Accept-EncodingCaching, auth, content negotiation
BodyJSON, protobuf, form data, multipartSerialization format choice

HTTPS wraps HTTP in TLS encryption. For any production system, HTTPS is non-negotiable β€” your interviewer assumes it. Don't burn interview time explaining that you'll use HTTPS; just use it.

HTTP is the foundation everything else builds on. Know it cold β€” not because it's exciting, but because every other protocol is defined in contrast to it.

REST β€” Resource-Oriented APIs

REST (Representational State Transfer) is not a protocol β€” it's a convention for using HTTP to build APIs. The core idea: model your system as resources (nouns) and use HTTP methods (verbs) to operate on them.

RESTful API resource mapping showing Users, Posts, and Comments as resources with GET, POST, PUT, DELETE operations mapped to CRUD operations, with nested resource paths like /users/{id}/posts.
REST maps naturally to CRUD operations on resources. If you've identified your core entities in a system design, you've already mapped out your REST API. The resource paths mirror your data model.
// A simple RESTful API for managing users
// GET    /users          β†’ List all users
// POST   /users          β†’ Create a new user
// GET    /users/{id}     β†’ Get a specific user
// PUT    /users/{id}     β†’ Update a user (full replacement)
// PATCH  /users/{id}     β†’ Update specific fields
// DELETE /users/{id}     β†’ Delete a user

// Nested resources express relationships:
// GET    /users/{id}/posts      β†’ List a user's posts
// POST   /users/{id}/posts      β†’ Create a post for a user
// GET    /users/{id}/posts/{pid} β†’ Get a specific post

REST's superpower is simplicity. Your API endpoints map directly to your data model. If you've identified your core entities (User, Post, Order, Product), your REST API practically writes itself.

Common mistakes I see in interviews:

  • Using verbs in URLs: POST /createUser instead of POST /users
  • Inconsistent naming: mixing /users/{id} with /get-user?id=123
  • Forgetting pagination: GET /users returning 2 million rows
  • Not using proper status codes: returning 200 with { "error": "not found" } instead of 404

For your interview: REST is the default β€” use it unless you have a specific reason for GraphQL or gRPC. Say "I'll design a RESTful API with these endpoints..." and list your core resources.

GraphQL β€” Flexible Data Fetching

GraphQL lets clients request exactly the data they need in a single query. Instead of the server deciding what each endpoint returns, the client specifies the shape of the response.

Two-panel diagram showing under-fetching (client makes 5 separate REST calls to build one page) and over-fetching (single REST endpoint returns 3KB of data when client needs 200 bytes), with GraphQL solving both by letting the client specify exact fields.
The over-fetching and under-fetching problem is the exact pain point GraphQL was built to solve. If your interview problem involves complex, variable client data needs, GraphQL is worth discussing.

Under-fetching: Your mobile app needs a user's name, their latest post, and that post's comment count. With REST, that's three separate API calls: GET /users/{id}, GET /users/{id}/posts?limit=1, and GET /posts/{pid}/comments/count. Three round trips, three connections, maybe 300ms of mobile latency.

Over-fetching: Your REST endpoint GET /users/{id} returns 50 fields (name, email, avatar, bio, preferences, settings, ...). Your mobile app only needs 3 of them. You're transferring 10Γ— more data than needed on a slow cellular connection.

GraphQL eliminates both: one query, one round trip, exactly the fields you need.

query {
  user(id: "123") {
    name
    latestPost {
      title
      commentCount
    }
  }
}

Where GraphQL shines: Mobile apps with bandwidth constraints. Frontend teams that iterate rapidly and need different data for different views. Public APIs where you can't predict what clients will need (GitHub's API v4 is GraphQL).

Where GraphQL struggles: In system design interviews, the requirements are fixed β€” GraphQL's flexibility doesn't add much when you know exactly what every page needs. Additionally, GraphQL queries can be expensive to execute on the backend (a deeply nested query might trigger hundreds of database calls), creating performance challenges that REST's fixed endpoints avoid.

For your interview: mention GraphQL when the interviewer emphasizes mobile clients, rapidly changing UI requirements, or BFF (Backend-for-Frontend) patterns. Otherwise, default to REST.

gRPC β€” Binary Service Communication

gRPC is Google's RPC framework that uses HTTP/2 and Protocol Buffers (protobuf) for efficient, typed, binary communication between services.

Architecture diagram showing external clients connecting via REST/HTTP to an API Gateway, which communicates with internal microservices via gRPC binary protocol. Internal services communicate with each other using gRPC.
The common pattern: REST for public-facing APIs (broad compatibility), gRPC for internal service-to-service communication (performance and type safety). Your API Gateway translates between the two worlds.

Where REST sends human-readable JSON over HTTP/1.1, gRPC sends binary protobuf over HTTP/2. The result: 3–10Γ— less bandwidth and significantly faster serialization/deserialization.

// Protocol Buffer definition β€” strongly typed schema
message User {
  string id = 1;
  string name = 2;
  string email = 3;
}

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
  rpc ListUsers (ListUsersRequest) returns (stream User); // Server streaming
}

Key advantages:

  • Binary protocol: ~3–10Γ— less bandwidth than JSON
  • Typed schema: Compile-time type safety; breaking changes caught before deploy
  • HTTP/2 native: Multiplexing, streaming, header compression built in
  • Code generation: Client and server stubs auto-generated from .proto files
  • Streaming: Natively supports server streaming, client streaming, and bidirectional streaming

The limitation: Browsers can't natively call gRPC endpoints. There's gRPC-Web as a bridge, but it's clunky. That's why gRPC lives behind your API gateway, not in front of it.

I'll often see candidates propose gRPC for everything in interviews. Don't do this β€” it signals that you're optimizing prematurely. Mention gRPC for internal service communication when the interviewer asks about performance or when you have obvious high-throughput service-to-service calls (like a recommendation engine querying a feature store millions of times per second).

SSE β€” Server Push Events

Server-Sent Events (SSE) is a standard built on HTTP that allows a server to push many messages over a single connection. The client opens a regular HTTP connection, and the server keeps it open β€” streaming events as they occur.

Server-Sent Events flow showing a single HTTP connection from client to server, with the server pushing multiple event messages over time. The client receives each message as it arrives without making new requests.
SSE is an elegant hack on HTTP: one long-lived response where the server writes events as they happen. The client gets push without WebSocket complexity.
// Server-side SSE (Node.js)
app.get('/events', (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  // Push events as they happen
  const send = (data: object) => {
    res.write(`id: ${Date.now()}\n`);
    res.write(`data: ${JSON.stringify(data)}\n\n`);
  };

  // Example: push price updates every second
  const interval = setInterval(() => {
    send({ price: Math.random() * 100, timestamp: Date.now() });
  }, 1000);

  req.on('close', () => clearInterval(interval));
});

SSE characteristics:

  • Unidirectional: Server β†’ client only. Client can't send messages back over the SSE connection (use a separate HTTP POST for that).
  • Auto-reconnect: The EventSource API automatically reconnects with the last event ID, and the server resumes from where the client dropped off.
  • Text-only: Messages are UTF-8 text (usually JSON). No binary data.
  • HTTP-native: Works through all existing HTTP infrastructure β€” proxies, CDNs, load balancers.

Where SSE wins: Live dashboards, auction price updates, notification feeds, stock tickers. Anywhere the server needs to push updates and the client just listens. SSE is dramatically simpler to implement and operate than WebSockets for push-only use cases.

Interview tip: SSE before WebSocket

If you need server-to-client push but NOT client-to-server push, use SSE. It's simpler, works through all HTTP infrastructure, and auto-reconnects. Reaching for WebSockets when SSE would suffice signals that you're over-engineering. Say "I'd use SSE here because we only need server push β€” WebSockets adds bidirectional overhead we don't need."

SSE is the right tool when the server needs to push events and the client just listens β€” simpler, cheaper, and more infrastructure-compatible than WebSockets.

WebSockets β€” Bidirectional Channels

WebSockets provide a persistent, full-duplex communication channel between client and server. Unlike HTTP's request-response pattern, either side can send messages at any time without waiting for the other.

WebSocket lifecycle showing initial HTTP upgrade request, protocol switch from HTTP to WebSocket, bidirectional message exchange, and eventual connection close.
WebSockets start as HTTP and 'upgrade' to a persistent TCP connection. Once upgraded, both sides can push messages freely β€” no request-response pattern, no headers per message, minimal framing overhead.
// Client-side WebSocket connection
const ws = new WebSocket('wss://chat.example.com/rooms/general');

ws.onopen = () => {
  // Connection established β€” send messages freely
  ws.send(JSON.stringify({ type: 'join', user: 'alice' }));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  // Process incoming messages from server or other clients
  renderMessage(message);
};

ws.onclose = () => {
  // Connection dropped β€” implement reconnection logic
  setTimeout(() => reconnect(), 1000 + Math.random() * 2000);
};

When WebSockets are essential:

  • Chat applications β€” Messages must flow both directions instantly
  • Collaborative editing β€” User keystrokes and cursor positions push to server, server pushes to all other editors
  • Real-time gaming β€” Player input goes to server, game state comes back, continuously
  • Trading platforms β€” Orders go to server, price updates and fills come back

The cost of WebSockets:

  • Every connection is stateful β€” the server must track each connected client in memory
  • L4 load balancer required β€” L7 load balancers terminate HTTP connections, which breaks WebSocket passthrough (unless explicitly configured for WebSocket support)
  • Connection management β€” Reconnection, heartbeats, authentication on reconnect all become your problem
  • Horizontal scaling complexity β€” If User A is connected to Server 1 and User B to Server 2, broadcasting a message requires inter-server communication (usually via Redis pub/sub or a message queue)

Don't reach for WebSockets unless you genuinely need bidirectional, high-frequency communication. They're powerful but expensive to operate at scale β€” and announcing WebSockets without justification is a red flag in interviews.

WebRTC β€” Peer-to-Peer Communication

WebRTC enables direct peer-to-peer communication between browsers β€” no intermediary server for the data exchange. It's the only application-layer protocol here that uses UDP under the hood, making it ideal for real-time audio and video.

graph LR
    %% Styling Definitions
    classDef signal fill:#f3e5f5,stroke:#8e24aa,stroke-width:2px,color:#000
    classDef stun fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000
    classDef turn fill:#ffebee,stroke:#c62828,stroke-width:2px,color:#000
    classDef peer fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
    classDef media fill:#fff8e1,stroke:#ff8f00,stroke-width:3px,color:#000

    %% Peers on the Extremes
    A["πŸ’» Peer A<br/>(Caller)"]:::peer
    B["πŸ“± Peer B<br/>(Callee)"]:::peer

    %% Servers in the Middle
    STUN{{"🌍 STUN Server<br/>(IP Discovery)"}}:::stun
    Sig{{"πŸ“‘ Signaling Server<br/>(Message Router)"}}:::signal
    TURN{{"πŸ”„ TURN Server<br/>(Media Relay)"}}:::turn

    %% Phase 2: STUN (Placed at top for visual balance)
    A <-.->|"Step 5. Query Public IP"| STUN
    STUN <-.->|"Step 5. Query Public IP"| B

    %% Phase 1: Signaling (SDP & ICE Routing)
    A -->|"Step 1. SDP Offer & ICE"| Sig
    Sig -->|"Step 2. Forward to B"| B
    B -->|"Step 3. SDP Answer & ICE"| Sig
    Sig -->|"Step 4. Forward to A"| A

    %% Phase 4: Fallback Scenario (TURN Relay)
    A -.->|"Step 6b. Relay Media (Fallback)"| TURN
    TURN -.->|"Step 6b. Relay Media to B"| B

    %% Phase 3: Ideal Scenario (Direct Media - Thick Line)
    A <===>|"Step 6a. DIRECT MEDIA STREAM<br/>(UDP/SRTP)"| B:::media

How WebRTC connects peers:

  1. Signaling β€” Both clients connect to a central signaling server (your server, using WebSockets or HTTP) to learn about each other and exchange connection metadata
  2. STUN β€” Each client contacts a STUN server to discover its own public IP and port (most clients are behind NAT)
  3. ICE candidates β€” Clients exchange their discovered addresses via the signaling server
  4. Connection attempt β€” Clients try to connect directly using the exchanged addresses
  5. TURN fallback β€” If direct connection fails (restrictive firewalls, symmetric NAT), traffic relays through a TURN server

The reality: About 85% of WebRTC connections succeed via STUN (direct P2P). The remaining 15% require TURN relay. TURN servers carry the actual media traffic, so they're expensive to operate β€” this is the main operational cost of WebRTC.

For interviews: Use WebRTC only for audio/video calling and conferencing β€” it's complex, brittle, and overkill for anything else. For 1:1 video calls, WebRTC is perfect. For a 100-person video call, you'll need a Selective Forwarding Unit (SFU) β€” a server that receives all streams and selectively forwards them, partially defeating the peer-to-peer advantage.

WebRTC is an absolute pain to get right, and even the best implementations still suffer connection losses. Stick to video and audio β€” anything else is a trap.

Choosing the Right Application Protocol

This is where interviews separate candidates who memorize from candidates who reason. There's no single right answer β€” it depends on your requirements. Here are the decision frameworks.

Decision tree for choosing between REST, GraphQL, and gRPC. Starts with 'Is this a public-facing API?' β€” Yes leads to REST (default) or GraphQL (flexible clients). No leads to 'Is performance critical?' β€” Yes leads to gRPC, No leads to REST.
REST is the safe default for public APIs. gRPC for high-performance internal services. GraphQL when client flexibility is a real requirement, not a hypothetical one.
Decision tree for choosing between SSE, WebSockets, and WebRTC for real-time communication. Starts with 'Need bidirectional communication?' β€” No leads to SSE. Yes leads to 'Is it audio/video?' β€” Yes leads to WebRTC, No leads to WebSockets.
The real-time protocol decision follows a simple escalation: start with SSE (simplest), escalate to WebSockets only if bidirectional is required, and WebRTC only for peer-to-peer media.
RequirementProtocolWhy not the alternatives
Standard CRUD APIRESTGraphQL adds complexity with no benefit; gRPC lacks browser support
Mobile app with varying data needsGraphQLREST over/under-fetches; gRPC doesn't support flexible queries
Internal service at 100K+ RPSgRPCREST's JSON serialization becomes a bottleneck; GraphQL adds execution overhead
Live dashboard, auction updatesSSEWebSockets adds bidirectional overhead you don't need
Chat, collaborative editingWebSocketsSSE is unidirectional; gRPC not supported in browsers
Video/audio callingWebRTCWebSockets adds server in the data path; SSE is unidirectional
Notification feed for mobileSSE (or push notifications)WebSockets battery drain; WebRTC absurd for text

The golden rule: use the simplest protocol that meets your requirements. Every step up the complexity ladder (REST β†’ SSE β†’ WebSocket β†’ WebRTC) adds infrastructure cost, operational burden, and failure modes.


Load Balancing

You've picked your protocols and your services are talking to each other. Now what happens when one server can't handle the load? You scale horizontally β€” add more servers.

More servers means you need a way to distribute traffic across them.

Horizontal scaling diagram showing a single overloaded server being replaced by a load balancer distributing traffic across four servers, with health checks removing failed servers from rotation.
The load balancer is the traffic cop of horizontal scaling. Without it, your clients don't know which server to talk to. With it, you can add or remove servers without clients knowing or caring.

Client-Side Load Balancing

The client itself decides which server to talk to. It periodically fetches a list of available servers from a service registry and selects one using a local algorithm (round-robin, random, least connections).

How it works:

  1. Client queries a service registry (e.g., Consul, etcd, or a DNS SRV record) for available servers
  2. Client receives a list of server addresses
  3. Client picks one using its own load-balancing algorithm
  4. Client sends the request directly β€” no intermediary hop

Where you'll see this:

  • gRPC has built-in client-side load balancing β€” the client resolves a DNS name to multiple IPs and balances across them
  • Redis Cluster clients discover all nodes and route GET/SET commands to the correct shard directly
  • DNS round-robin rotates IP addresses so each client gets a different "first" server

The advantage: No extra network hop. The client talks directly to the server. This eliminates the load balancer as both a latency source and a single point of failure.

The disadvantage: Every client must implement the balancing logic. Clients may hold stale server lists. Rolling out a new balancing algorithm means updating every client.

For your interview: mention client-side load balancing for internal microservice communication (especially with gRPC). For external traffic, you'll want a dedicated load balancer.

Dedicated Load Balancers

A dedicated load balancer sits between clients and servers, receiving all incoming traffic and distributing it across backend servers. The client sees only the load balancer's address β€” it has no idea how many servers exist behind it.

The key decision with dedicated load balancers is which OSI layer they operate at β€” Layer 4 or Layer 7.

Layer 4 Load Balancers

L4 load balancers operate at the transport layer. They route based on IP address and port β€” they don't inspect the actual content of the packets. Think of them as a transparent pipe: they establish a TCP connection from the client, pick a backend server, and forward all packets from that connection to that server.

Side-by-side comparison of L4 and L7 load balancers. L4 shows a TCP connection forwarded to a backend server. L7 shows the HTTP request being inspected, with routing decisions based on URL path, headers, and cookies.
L4 forwards TCP connections blindly β€” fast but dumb. L7 inspects HTTP requests and routes intelligently β€” slower but powerful. Choose L4 for raw performance and WebSockets; L7 for HTTP routing, header-based decisions, and TLS termination.

L4 characteristics:

  • Forwards entire TCP/UDP connections, not individual requests
  • Cannot read HTTP headers, cookies, or URL paths
  • Very fast β€” minimal packet inspection
  • Preserves client-to-server TCP connection (important for WebSockets)
  • Can handle any protocol, not just HTTP

Layer 7 Load Balancers

L7 load balancers operate at the application layer. They read and understand HTTP requests β€” inspecting headers, URLs, cookies, and request bodies. They terminate the client's TCP/TLS connection and create new connections to backend servers.

L7 characteristics:

  • Routes based on URL path, headers, cookies, query parameters
  • Terminates and re-establishes TCP/TLS connections
  • Can modify requests and responses (add headers, rewrite URLs)
  • Handles TLS termination centrally (offloads crypto from backend servers)
  • More CPU-intensive than L4 due to packet inspection
  • Better for HTTP traffic; supports sticky sessions via cookies

The critical difference: With an L4 load balancer, the client has a TCP connection that passes through the load balancer to a specific server. With an L7 load balancer, the client has a TCP connection to the load balancer, and the load balancer has separate TCP connections to backend servers. This is why L7 load balancers can't natively support WebSockets without explicit configuration β€” the "upgrade" request terminates at the load balancer.

Decision tree for choosing between client-side, L4, and L7 load balancers. Starts with 'Is this external traffic?' β€” No leads to 'Client-side LB (gRPC/internal)'. Yes leads to 'Need WebSocket support?' β€” Yes leads to L4, No leads to L7.
For external HTTP traffic: L7. For WebSockets: L4 (or L7 with explicit WebSocket support). For internal microservices: client-side. When in doubt, L7 is the safe default.

Interview tip: L4 for WebSockets, L7 for everything else

When you add a load balancer in an interview, specify the layer. "I'd use an L7 load balancer here for HTTP routing and TLS termination" shows you know the trade-off. If you later add WebSockets, say "we'll need L4 load balancing for the WebSocket connections to maintain persistent TCP connections to the backend." That single distinction separates generic answers from precise ones.

Health Checks and Fault Tolerance

Load balancers don't just distribute traffic β€” they detect and route around failures. If a backend server crashes, the load balancer stops sending traffic to it within seconds.

How health checks work:

  • TCP health check: Attempts a TCP connection to the server. If the connection succeeds, the server is healthy. Fast and low overhead.
  • HTTP health check: Sends an HTTP request (usually GET /health) and checks for a 200 response. Can verify that the application is actually working, not just that the port is open.
  • Custom health checks: Call a specific endpoint that verifies database connectivity, downstream dependencies, and other application health.
// A proper health check endpoint β€” not just "return 200"
app.get('/health', async (req, res) => {
  try {
    // Check database connectivity
    await db.query('SELECT 1');
    // Check Redis connectivity
    await cache.ping();
    // Check critical downstream service
    const downstream = await fetch('http://payment-service/health', {
      signal: AbortSignal.timeout(2000) // 2s timeout
    });
    if (!downstream.ok) throw new Error('Payment service unhealthy');

    res.status(200).json({ status: 'healthy' });
  } catch (error) {
    res.status(503).json({ status: 'unhealthy', error: error.message });
  }
});

Health checks are the mechanism that makes load balancers reliable. Without them, you're routing traffic to dead servers.

Load Balancing Algorithms

The algorithm determines how the load balancer picks which server handles each request.

AlgorithmHow it worksBest for
Round RobinSequential rotation: server 1, 2, 3, 1, 2, 3...Stateless services with uniform request cost
RandomRandom server selectionSimilar to round robin; simpler implementation
Least ConnectionsRoutes to server with fewest active connectionsWebSocket/SSE services with persistent connections
Least Response TimeRoutes to the fastest-responding serverWhen some servers are faster (different hardware, proximity)
IP HashHash of client IP determines serverSession persistence without cookies
Consistent HashingHash ring maps requests to serversCache-aware routing; minimizes reshuffling when servers change

For most stateless HTTP services, round robin or random is correct. The request workload is roughly uniform and any server can handle any request.

For services with persistent connections (WebSocket, SSE), use least connections. Without it, new connections pile up on whatever server is "next" in the round-robin rotation while servers with many lingering connections continue accumulating traffic.

Pick the algorithm that matches your workload pattern, and you'll rarely need to think about it again.


Handling Network Failures

Distributed systems fail β€” not "might fail," they will fail. The question isn't whether your network calls will break, but whether your system degrades gracefully when they do. This is where senior candidates separate themselves.

Timeouts and Retries with Backoff

The most fundamental reliability pattern: set a timeout on every network call and retry on failure.

Timeouts prevent one slow service from consuming all your resources. Without a timeout, a stalled downstream service can hold your threads/connections indefinitely until your entire thread pool is exhausted. Set a timeout on every outbound network call β€” every single one.

Retries handle transient failures β€” brief network hiccups, momentary server overloads, temporary DNS resolution failures. Most transient issues resolve within seconds.

Backoff prevents retries from hammering a recovering service. Instead of retrying immediately, you wait β€” and each subsequent retry waits exponentially longer.

async function fetchWithRetry<T>(
  url: string,
  options: { maxRetries?: number; baseDelayMs?: number; timeoutMs?: number } = {}
): Promise<T> {
  const { maxRetries = 3, baseDelayMs = 200, timeoutMs = 5000 } = options;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, {
        signal: AbortSignal.timeout(timeoutMs),
      });
      if (response.ok) return response.json() as T;
      if (response.status < 500) throw new Error(`Client error: ${response.status}`);
      // 5xx β†’ retry
    } catch (error) {
      if (attempt === maxRetries) throw error;
    }

    // Exponential backoff with jitter: 200ms, 400ms, 800ms (Β±random)
    const delay = baseDelayMs * Math.pow(2, attempt);
    const jitter = delay * 0.5 * Math.random(); // 0-50% jitter
    await new Promise(resolve => setTimeout(resolve, delay + jitter));
  }
  throw new Error('Unreachable');
}

Jitter is not optional

Without jitter, all clients retry at exactly the same intervals β€” creating synchronized waves of retries that repeatedly spike load on a struggling service. Jitter randomizes the retry timing so retries spread across a window instead of firing simultaneously. It's the difference between a recovery and a sustained outage. Always add jitter to your backoff.

The magic phrase interviewers look for: "Retry with exponential backoff and jitter." That sentence signals you understand both the retry mechanism and the failure mode it can create.

Idempotency

Retries are dangerous when operations have side effects. If you retry a payment request, you might charge the customer twice. Idempotency means that performing the same operation multiple times produces the same result as performing it once.

How to make writes idempotent:

  1. Assign each operation a unique idempotency key (UUID, or a natural key like user_id + timestamp)
  2. Before processing, check if that key has already been processed
  3. If already processed, return the stored result without re-executing
  4. If new, process the operation and store the result keyed by the idempotency key
async function processPayment(req: PaymentRequest): Promise<PaymentResult> {
  const idempotencyKey = req.headers['Idempotency-Key'];

  // Check if we've already processed this request
  const existing = await db.query(
    'SELECT result FROM processed_payments WHERE idempotency_key = $1',
    [idempotencyKey]
  );

  if (existing) return existing.result; // Return stored result β€” no re-charge

  // Process the payment
  const result = await chargeCard(req.body.amount, req.body.cardToken);

  // Store result for future duplicate detection
  await db.query(
    'INSERT INTO processed_payments (idempotency_key, result) VALUES ($1, $2)',
    [idempotencyKey, JSON.stringify(result)]
  );

  return result;
}

HTTP GET, PUT, and DELETE are inherently idempotent. POST requires explicit idempotency implementation. In interviews, whenever you propose retries for write operations, immediately follow with "using idempotency keys to prevent duplicate processing."

Circuit Breakers

When a downstream service is consistently failing, retries make things worse β€” you're adding load to a system that's already drowning. Circuit breakers detect sustained failures and stop calling the failing service entirely, giving it time to recover.

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure count<br/>exceeds threshold
    Open --> HalfOpen : Timeout expires<br/>(cool-down period)
    HalfOpen --> Closed : Test request<br/>succeeds
    HalfOpen --> Open : Test request fails
    
    note right of Closed : Normal operation
    note right of Open : Fail fast β€” no calls
    note right of HalfOpen : Test with single request

The three states:

  • Closed (normal) β€” Requests flow through normally. Failures are counted.
  • Open (tripped) β€” All requests immediately fail without calling the downstream service. No load on the failing dependency. The circuit "trips" when failures exceed a threshold (e.g., 5 failures in 10 seconds).
  • Half-Open (testing) β€” After a cool-down period, one test request is allowed through. If it succeeds, circuit closes. If it fails, circuit reopens.

The insight most candidates miss: circuit breakers are as much about protecting the failing service as they are about protecting your service. By stopping the flood of requests, you give the downstream service room to recover instead of pummeling it with retries it can't handle.

Interview tip: combine all three patterns

The strongest failure-handling answer combines all three: "Each service call has a 3-second timeout. On timeout, we retry with exponential backoff and jitter, up to 3 attempts, using idempotency keys to prevent duplicate processing. If a service fails consistently, a circuit breaker trips β€” failing fast for 30 seconds before testing again." That's four sentences, and it covers every failure pattern interviewers look for.


Regionalization and Latency

For global services, the physical distance between clients and servers determines your baseline latency. Light travels through fiber at about 200,000 km/s β€” a New York to London round trip (5,600 km) has a theoretical minimum of ~56ms, before any processing.

Global deployment showing CDN edge locations close to users, regional application servers in US-East and EU-West, and a primary database with cross-region replication.
Data locality is the fundamental lever for reducing latency. CDN for static content, regional app servers for compute, and regional read replicas for data. The goal: keep as much work as possible geographically close to the user.

Content Delivery Networks (CDNs)

CDNs cache static and semi-static content at edge locations close to users β€” Cloudflare has 300+ cities, CloudFront has 600+ edge locations. When a user requests a cached asset, it's served from a server that might be 10ms away instead of 200ms away.

For interviews, reach for a CDN whenever you have static content (images, videos, JS bundles) or highly cacheable API responses (search results, product catalogs). The CDN itself is covered in depth in the CDN article β€” here just know it's the networking solution to geographic latency for read-heavy data.

Regional Partitioning

When your data is inherently geographic (Uber rides are local, DoorDash orders are local), partition your services and data by region. Each region handles all reads and writes for its geography with co-located servers and databases.

Why this works: If Uber's Miami users only care about Miami drivers, put both the user data and driver data for Miami in a US-Southeast region. Every query stays within a 5ms network boundary instead of crossing 100ms to a central database.

The complication: Cross-region queries. What if a Miami user wants to see their ride history from when they visited New York? You need either cross-region reads (slow) or asynchronous replication of user data to every region (complex, potentially stale). Regional partitioning only works cleanly when the data access pattern is genuinely regional.

CDNs for static content, regional partitions for geographic data β€” those two tools handle 90% of global latency challenges.


Trade-offs

ProsCons
Protocol selection directly optimizes for your workload (latency, throughput, reliability)More protocol-aware architecture = more complexity and specialized knowledge required
Load balancers enable horizontal scaling and automatic failoverLoad balancers are an additional hop, potential SPOF, and infrastructure cost
Retries and circuit breakers make the system resilient to transient failuresRetry storms can amplify failures; circuit breakers can mask underlying problems
Regional deployment reduces latency for global usersCross-region replication adds consistency challenges and infrastructure cost
gRPC's binary protocol provides 3–10Γ— efficiency for internal communicationMixed protocol stack (REST external, gRPC internal) adds cognitive and operational overhead
WebSockets enable real-time bidirectional communicationStateful connections complicate horizontal scaling and require specialized infrastructure

The fundamental tension here is simplicity vs. performance/resilience β€” every networking sophistication (protocol optimization, smart load balancing, regional deployment) reduces latency or improves reliability but adds infrastructure you must understand, monitor, and maintain. The right answer is always the simplest architecture that meets your requirements β€” then optimize only the bottlenecks you can prove exist.


When to Use It / When to Avoid It

Ok, so the question isn't really "when to use networking" β€” you can't avoid it. The real question is when to invest in advanced networking patterns vs. keeping things simple.

Invest in networking sophistication when:

  • Your services communicate at high frequency (>10K RPS between two services) β€” protocol choice matters
  • You have real-time requirements (chat, live updates, gaming) β€” SSE/WebSocket/WebRTC is necessary
  • Your users are globally distributed β€” CDN and regional deployment are essential
  • Reliability is critical (payments, order processing) β€” retries, idempotency, and circuit breakers are table stakes
  • Internal service mesh is complex (20+ microservices) β€” client-side load balancing and gRPC pay off

Keep it simple when:

  • You have a monolith or 2–3 services β€” REST over HTTP is fine for everything
  • Traffic is moderate (under 1K RPS) β€” default round-robin load balancing works
  • Users are in one region β€” skip regional deployment complexity
  • You're prototyping β€” don't design for global scale on day one

If your system design fits on a whiteboard with REST and a single load balancer, don't add complexity to impress your interviewer. Start simple, then optimize the bottleneck they ask about. Complexity you don't need is complexity that will break.


Real-World Examples

Discord β€” 100M+ Active Users on WebSockets and Elixir

Discord serves over 100 million monthly active users with real-time messaging, voice, and video. Every connected client maintains a WebSocket connection for instant message delivery. Their gateway servers (written in Elixir) each handle ~1 million concurrent WebSocket connections.

The non-obvious lesson: Discord's biggest scaling challenge wasn't the WebSocket connections themselves β€” it was fan-out. A message sent to a server with 500,000 members must be delivered to 500,000 WebSocket connections, possibly spread across hundreds of gateway servers. They solved this with a custom pub/sub layer that routes messages only to the gateway servers with relevant connected clients β€” not all gateway servers.

Cloudflare β€” Anycast DNS at 50M+ Requests/Second

Cloudflare's DNS resolver (1.1.1.1) handles over 50 million queries per second across 300+ cities worldwide. They use Anycast β€” the same IP address is advertised from every data center, and BGP routing directs each query to the nearest one.

The non-obvious lesson: Even with Anycast, Cloudflare still has to deal with asymmetric routing. A client might send a DNS query to one data center but receive the response from a different one (due to BGP route changes). Their stateless UDP design makes this work β€” every server can independently answer any query without knowing about the others.

Stripe β€” Idempotency at Financial Scale

Stripe processes hundreds of billions of dollars in payments annually. Every mutation API supports idempotency keys, allowing clients to safely retry failed payment requests without double-charging. The idempotency system stores the complete response of the first successful request and returns it verbatim on subsequent retries β€” ensuring byte-identical responses regardless of how many times a request is retried.

The non-obvious lesson: Stripe's idempotency keys expire after 24 hours. They found that keys lasting longer than that caused more confusion than benefit β€” stale keys preventing legitimate new operations. The 24-hour window is long enough to handle any retry scenario but short enough to prevent accidental deduplication of genuinely new requests.


How This Shows Up in Interviews

Networking knowledge rarely gets its own dedicated question β€” it surfaces as a cross-cutting concern across every system design problem. When you draw an arrow between two services, your interviewer may probe on what protocol that arrow uses, how it handles failure, and how it scales.

When to bring it up proactively:

  • When you draw your first service-to-service arrow β€” name the protocol (REST, gRPC)
  • When you need real-time updates β€” justify your choice of SSE vs WebSocket vs polling
  • When discussing scalability β€” load balancer type (L4/L7) and algorithm
  • When an interviewer asks "what happens if this service goes down" β€” timeouts, retries, circuit breakers

Depth expected by level:

  • Mid-level: Know TCP vs UDP, REST APIs, and basic load balancing. Can explain why you'd use a load balancer.
  • Senior: Can justify protocol choices (REST vs gRPC), explain L4 vs L7 trade-offs, and design retry strategies with idempotency. Understands WebSocket scaling challenges.
  • Staff: Can discuss QUIC, connection pooling, regional partitioning, circuit breaker state machines, and DNS load balancing failure modes. Understands the full stack implications of protocol choices.

Interview tip: name the protocol on every arrow

When drawing your architecture diagram, label each arrow with the protocol. "REST over HTTPS" between client and API gateway. "gRPC" between internal services. "WebSocket" for real-time connections. This takes 10 seconds and signals protocol awareness that most candidates skip.

Interviewer asksStrong answer
"How do services communicate here?""REST over HTTPS for the public API, gRPC for internal service calls where we need low latency and type safety."
"What happens if the payment service goes down?""3-second timeout, retry with exponential backoff and jitter up to 3 times, idempotency key on every payment request. If failures persist, circuit breaker trips β€” we return a 'payment pending' status and process via async queue."
"Why WebSockets and not SSE?""Because our chat feature requires bidirectional messaging β€” the client needs to send messages and receive them in real-time. SSE only supports server push."
"How do you handle global users?""CDN for static assets, regional API servers with co-located databases, and DNS-based routing to the nearest region. Cross-region reads fall back to the primary with ~100ms added latency."
"Why not use gRPC for the public API?""Browsers can't natively call gRPC. We'd need gRPC-Web as a bridge, adding complexity. REST is universally supported and the serialization overhead isn't our bottleneck β€” database queries are."


Quick Recap

  1. Networking determines how your distributed services communicate β€” every arrow on your whiteboard is a protocol choice with latency, reliability, and scalability implications.
  2. TCP is the default transport protocol β€” use UDP only when packet loss is acceptable and latency is critical (streaming, gaming). QUIC is TCP's modern upgrade for mobile-first services.
  3. REST is the default API protocol β€” reach for GraphQL when client flexibility matters, gRPC when internal service throughput is critical, SSE for server push, and WebSockets only when you need bidirectional real-time communication.
  4. Load balancers distribute traffic and detect failures β€” L7 for HTTP (most common), L4 for WebSockets, client-side for internal microservices. Always specify the layer in your interview.
  5. Networks fail, and your design must handle it β€” timeouts on every call, retries with exponential backoff and jitter, idempotency keys for write operations, and circuit breakers for cascading failure prevention.
  6. Geographic latency is bounded by physics β€” CDNs for static content, regional partitioning for geographic data, and DNS-based routing for directing users to the nearest region.
  7. The strongest interview signal is naming the protocol on every arrow β€” "REST over HTTPS for the public API, gRPC for internal calls, WebSocket for real-time features" shows deliberate engineering, not handwaving.

Related Concepts

  • Load Balancing β€” Deep dive into load balancing algorithms, health checks, and scaling strategies. If your interview probes "how does the load balancer work," this article has the answers.
  • Caching β€” The most effective way to reduce network round trips. Understanding cache layers (browser, CDN, application) is the complement to understanding the network calls they eliminate.
  • CDN β€” Detailed coverage of Content Delivery Networks, edge caching strategies, and cache invalidation. The networking layer's primary tool for reducing global latency.
  • Scalability β€” Horizontal and vertical scaling patterns that networking enables. Load balancing, connection pooling, and regional deployment are the networking implementation of scalability principles.
  • Microservices β€” Service-to-service communication is the core of microservices architecture. Protocol selection (REST vs gRPC), service discovery, and failure handling directly apply.

Previous

Service mesh

Next

Security: authentication, authorization & identity

Comments

On This Page

TL;DRThe Problem It SolvesWhat Is It?How It WorksKey ComponentsThe Networking StackNetwork Layer β€” IPTransport Layer β€” TCP, UDP, and QUICApplication Layer β€” Where You LiveTransport Protocols Deep DiveTCP β€” The Reliable WorkhorseUDP β€” Speed Over SafetyQUIC β€” The Modern CompromiseChoosing Your Transport ProtocolApplication Layer ProtocolsHTTP/HTTPS β€” The Web's FoundationREST β€” Resource-Oriented APIsGraphQL β€” Flexible Data FetchinggRPC β€” Binary Service CommunicationSSE β€” Server Push EventsWebSockets β€” Bidirectional ChannelsWebRTC β€” Peer-to-Peer CommunicationChoosing the Right Application ProtocolLoad BalancingClient-Side Load BalancingDedicated Load BalancersLayer 4 Load BalancersLayer 7 Load BalancersHealth Checks and Fault ToleranceLoad Balancing AlgorithmsHandling Network FailuresTimeouts and Retries with BackoffIdempotencyCircuit BreakersRegionalization and LatencyContent Delivery Networks (CDNs)Regional PartitioningTrade-offsWhen to Use It / When to Avoid ItReal-World ExamplesDiscord β€” 100M+ Active Users on WebSockets and ElixirCloudflare β€” Anycast DNS at 50M+ Requests/SecondStripe β€” Idempotency at Financial ScaleHow This Shows Up in InterviewsQuick RecapRelated Concepts