gRPC internals
How gRPC works under the hood: HTTP/2 multiplexing, Protobuf binary framing, the four RPC types (unary/server streaming/client streaming/bidirectional), connection management, and operational tradeoffs vs REST.
The problem
Your e-commerce platform has 40 microservices. Each service calls 3-5 others per request. The order service calls inventory, payment, shipping, and notification services, all via REST/JSON over HTTP/1.1. At 10,000 orders per second, the system makes 40,000-50,000 inter-service HTTP calls per second.
Each REST call sends a JSON payload (text, verbose, with repeated field names), opens a new TCP connection (or waits for one from a pool), serializes/deserializes through a text parser, and uses HTTP/1.1 which blocks behind previous requests on the same connection. The CPU overhead of JSON parsing alone consumes 15% of your service's compute. Connection management adds latency, and there is no schema enforcement: a field name typo deploys to production and fails at runtime.
This is the problem gRPC solves. It replaces text with binary, multiplexes hundreds of RPCs over a single connection, enforces a typed contract at compile time, and provides native streaming without websocket workarounds.
What it is
gRPC is a remote procedure call framework that combines HTTP/2 as the transport protocol with Protocol Buffers (Protobuf) as the serialization format, wrapped in a code-generation system that produces typed client and server stubs from a shared .proto schema.
Think of it like a phone call vs sending letters. REST is like sending letters: you write a message in English (JSON), put it in an envelope (HTTP/1.1), address it, and wait for a reply letter. gRPC is like a phone call: you dial once (establish an HTTP/2 connection), speak directly in a shared language (Protobuf binary), and both sides can talk simultaneously (streaming). The phone line stays open for multiple conversations.
How it works
A gRPC call flows through several layers: the application calls a generated stub, which serializes the request to Protobuf binary, wraps it in a gRPC frame (5-byte header with compression flag and message length), sends it over an HTTP/2 stream with HPACK-compressed headers, and the server reverses the process.
The RPC method name maps to an HTTP/2 path: /PackageName.ServiceName/MethodName. Headers are compressed using HPACK (HTTP/2 header compression), which is especially efficient for gRPC because consecutive calls to the same method share nearly identical headers.
// gRPC wire format for each message:
[1 byte] compressed flag (0 = no, 1 = yes)
[4 bytes] message length (big-endian uint32)
[N bytes] serialized protobuf payload
// Full HTTP/2 frame sequence for a unary RPC:
Client → Server: HEADERS frame (method, path, content-type)
Client → Server: DATA frame (gRPC message above)
Client → Server: END_STREAM
Server → Client: HEADERS frame (content-type)
Server → Client: DATA frame (gRPC response message)
Server → Client: HEADERS frame (grpc-status, grpc-message) + END_STREAM
HTTP/2 multiplexing
HTTP/1.1 has head-of-line blocking: requests on a connection must complete in order. If Request 1 takes 500ms, Request 2 waits even if its response is ready in 5ms. Browsers work around this by opening 6+ parallel TCP connections per host, which wastes memory, TCP slow start overhead, and TLS handshake time.
HTTP/2 multiplexes multiple independent streams over a single TCP connection. Each gRPC call gets its own stream ID. Frames from different streams interleave freely:
For gRPC, this means a single connection handles hundreds of concurrent RPCs. I've seen production gRPC channels sustain 5,000+ concurrent streams on a single TCP connection. The per-RPC overhead is tiny: a few bytes of framing instead of a new TCP handshake.
HTTP/2 also provides flow control per stream. If one RPC's client falls behind in reading a server-streaming response, HTTP/2 flow control pauses only that stream. Other streams continue unaffected. This is critical for gRPC streaming: a slow consumer on one stream does not block fast consumers on other streams.
HTTP/2 solves head-of-line blocking at the HTTP level, but TCP still has HOL blocking at the transport level. If a single TCP segment is lost, all streams on that connection stall until the segment is retransmitted. QUIC (HTTP/3) solves this by implementing streams at the transport layer, but gRPC over QUIC is still experimental.
Protobuf serialization
JSON is text. Every field name, every bracket, every quote character is serialized and transmitted. Protobuf replaces field names with small integer tags and uses binary encoding:
HTTP/1.1 REST + JSON:
POST /users HTTP/1.1
Content-Type: application/json
{"user_id": 12345, "name": "Alice", "email": "alice@example.com"}
(78 bytes, text, must be parsed character by character)
gRPC + Protobuf:
gRPC message header (5 bytes): [compressed flag][message length]
Protobuf payload:
field 1 (user_id, varint): \x08\xB9\x60 (3 bytes)
field 2 (name, string): \x12\x05Alice (7 bytes)
field 3 (email, string): \x1A\x13alice@example.com (21 bytes)
(36 bytes, binary, schema-validated)
Protobuf uses varint encoding for integers: small numbers (0-127) fit in a single byte. Field numbers (1, 2, 3) replace field names, so "user_id" never appears on the wire. The wire type (varint, length-delimited, fixed32, fixed64) is packed into the tag byte.
// Protobuf wire format for field 1 (user_id = 12345):
Tag byte: 0x08 = (field_number=1 << 3) | wire_type=0 (varint)
Value: 0xB9 0x60 = varint encoding of 12345
// Protobuf wire format for field 2 (name = "Alice"):
Tag byte: 0x12 = (field_number=2 << 3) | wire_type=2 (length-delimited)
Length: 0x05 = 5 bytes
Value: "Alice" (raw UTF-8 bytes)
The .proto schema is shared between client and server at build time:
syntax = "proto3";
message User {
int32 user_id = 1; // field number 1, varint encoding
string name = 2; // field number 2, length-delimited
string email = 3; // field number 3, length-delimited
}
Backwards compatibility is built into the wire format. New fields added to the schema are ignored by old clients (unknown tag numbers are skipped). Old fields removed from the schema default to zero values for new clients. This is why Protobuf uses field numbers instead of field names, as long as you never reuse a field number, evolution is safe.
The four streaming types
gRPC supports four RPC patterns, all built natively on HTTP/2 streams:
service UserService {
// Unary: one request, one response
rpc GetUser (GetUserRequest) returns (User);
// Server streaming: one request, stream of responses
rpc ListUsers (ListUsersRequest) returns (stream User);
// Client streaming: stream of requests, one response
rpc BulkCreateUsers (stream CreateUserRequest) returns (BulkCreateResponse);
// Bidirectional streaming: both sides stream independently
rpc ChatStream (stream ChatMessage) returns (stream ChatMessage);
}
Unary is the most common: request-response, like a REST call. Server streaming is ideal for list/watch operations (send a query, receive a stream of results). Client streaming suits bulk uploads (stream many records, get one acknowledgement). Bidirectional streaming enables real-time communication (chat, live telemetry, collaborative editing) without websocket workarounds.
Streaming is not a bolted-on feature. HTTP/2's DATA frames can be sent continuously on an open stream. Either side can send messages at any time, and HTTP/2 flow control prevents a fast sender from overwhelming a slow receiver. This is why I prefer gRPC for any internal service that needs to push updates: the transport handles backpressure natively.
Connection management
gRPC maintains long-lived HTTP/2 connections per server. Connection setup is expensive (TCP handshake + TLS negotiation + HTTP/2 SETTINGS frame exchange), so connection reuse is critical:
# Channel creation (expensive, do once at startup):
channel = grpc.insecure_channel('user-service:8080')
stub = user_pb2_grpc.UserServiceStub(channel)
# RPC calls (cheap, reuse the channel):
for user_id in user_ids:
response = stub.GetUser(GetUserRequest(user_id=user_id))
gRPC channels maintain a connection pool internally. They handle keepalive (periodic HTTP/2 PING frames to detect dead connections), automatic reconnection with exponential backoff, and load balancing across multiple server addresses:
channel = grpc.insecure_channel(
'user-service:8080',
options=[
('grpc.keepalive_time_ms', 30000), # PING every 30s
('grpc.keepalive_timeout_ms', 5000), # wait 5s for PING response
('grpc.max_connection_idle_ms', 60000), # close idle connections after 60s
('grpc.max_connection_age_ms', 300000), # recycle connections every 5 min
]
)
Interview tip: gRPC load balancing pitfall
gRPC's long-lived connections create a load balancing problem. A traditional L4 load balancer (TCP-level) distributes connections, not individual RPCs. If ServiceA opens one gRPC connection to a load balancer, all RPCs flow through that single connection to a single backend. You need either client-side load balancing (the gRPC channel resolves multiple addresses and distributes RPCs) or an L7 proxy like Envoy that understands HTTP/2 streams and balances at the RPC level.
Deadlines and cancellation propagation
gRPC has first-class support for deadlines (timeouts) and cancellation that propagate across service boundaries. This is one of gRPC's most underappreciated features for distributed systems.
When a client sets a deadline, the remaining time is sent as the grpc-timeout header on the HTTP/2 request. Each downstream service receives the remaining time, not the original timeout. If ServiceA calls ServiceB with a 5s deadline, and ServiceB takes 1s to process before calling ServiceC, ServiceC receives a ~4s deadline.
// Deadline propagation chain:
Client → ServiceA (deadline: 5000ms)
ServiceA processes for 500ms
ServiceA → ServiceB (deadline: 4500ms) // remaining time
ServiceB processes for 300ms
ServiceB → ServiceC (deadline: 4200ms) // remaining time
ServiceC returns in 100ms
When a deadline expires or a client cancels an RPC, the cancellation propagates to all downstream services. HTTP/2 RST_STREAM frames signal cancellation at the transport layer. This prevents wasted work: if the client has already timed out, every downstream service stops processing immediately.
Without deadline propagation (as in REST), ServiceC would continue processing even though the client already timed out. I've seen systems where a 5-second client timeout triggers a cascade of 30-second database queries across downstream services, all producing results that nobody reads.
gRPC vs REST comparison
| Dimension | REST (HTTP/1.1 + JSON) | gRPC (HTTP/2 + Protobuf) |
|---|---|---|
| Payload size | 2-10x larger (text JSON, repeated field names) | Compact (binary varint, field numbers) |
| Serialization speed | Slower (text parsing, no schema) | 5-10x faster (binary, schema-driven, zero-alloc) |
| Streaming | Workarounds needed (SSE, WebSocket) | Native (4 RPC types, flow control) |
| Deadline propagation | Manual (pass timeout headers yourself) | Built-in (grpc-timeout propagates automatically) |
| Browser support | Native | Requires grpc-web proxy (Envoy, grpc-gateway) |
| Human-readable | Yes (curl, browser dev tools) | No (need grpcurl, Bloom RPC, or protoc) |
| Contract enforcement | Optional (OpenAPI, often out of date) | Enforced by .proto schema at compile time |
| Error model | HTTP status codes (limited) | Rich gRPC status codes + error details proto |
| Code generation | Optional (swagger-codegen) | Required and central (protoc generates stubs) |
Production usage
| System | Usage | Notable behavior |
|---|---|---|
| gRPC was originally "Stubby," Google's internal RPC framework since 2001. gRPC is the open-source version released in 2015. | Every Google service communicates via gRPC. Billions of RPCs per second across Google's infrastructure. Stubby predates HTTP/2; gRPC was redesigned to run on standard HTTP/2. | |
| Netflix | gRPC for inter-service communication between backend microservices. | Netflix migrated from REST to gRPC for latency-critical paths. Uses custom load balancing with Eureka service discovery integrated into gRPC channels. |
| Envoy / Istio | Envoy proxy speaks gRPC natively for both data plane (proxying RPCs) and control plane (xDS API for configuration). | Envoy's xDS configuration API is entirely gRPC streaming. Service mesh sidecars use gRPC for efficient sidecar-to-sidecar communication with L7 load balancing per-RPC. |
| Kubernetes | etcd (Kubernetes backing store) uses gRPC for client-server and peer-to-peer communication. The Kubernetes API server talks to etcd via gRPC. | etcd uses bidirectional streaming for the Watch API: clients subscribe to key changes and receive a stream of updates. This replaces HTTP long-polling. |
| Buf / Connect | Connect is a gRPC-compatible framework that also supports HTTP/1.1+JSON for the same .proto service definitions. | Solves the browser compatibility problem: a single .proto definition generates both gRPC (for internal services) and REST/JSON (for browsers) endpoints. |
Limitations and when NOT to use it
- No native browser support. Browsers cannot make HTTP/2 requests with the gRPC framing. You need grpc-web (which proxies through Envoy or a gateway) or a dual-protocol framework like Connect. For public-facing APIs consumed by web browsers, REST is simpler.
- Debugging is harder. gRPC traffic is binary. You cannot inspect payloads with curl, browser dev tools, or basic HTTP logging. You need grpcurl, Bloom RPC, or gRPC-specific middleware to log decoded messages. In production incidents, this slows down investigation.
- Load balancing requires L7 awareness. Standard L4 load balancers (AWS NLB, HAProxy in TCP mode) distribute connections, not RPCs. A single gRPC connection carries hundreds of RPCs, so L4 balancing creates hot spots. You need Envoy, Linkerd, or client-side balancing.
- Protobuf schema evolution has constraints. You cannot rename fields (only the field number matters on the wire), you cannot change a field's type without breaking compatibility, and you must never reuse a deleted field number. Teams that do not understand these rules break backwards compatibility.
- Streaming adds operational complexity. Long-lived streams hold resources on both client and server. A server-streaming RPC that runs for hours ties up a goroutine/thread and an HTTP/2 stream. Without proper deadline/cancellation handling, leaked streams cause resource exhaustion.
- Smaller ecosystem for tooling. REST has decades of tooling: Postman, Swagger UI, browser dev tools, API gateways. gRPC's tooling is newer and less mature. Rate limiting, caching, and API management are harder with gRPC than REST.
Interview cheat sheet
- When asked "gRPC or REST for microservices": gRPC for internal service-to-service communication (binary efficiency, streaming, deadline propagation, typed contracts). REST for public APIs consumed by browsers and third parties. The deciding factor is usually browser support.
- When discussing performance: gRPC's Protobuf encoding is 2-5x smaller than JSON and 5-10x faster to serialize/deserialize. HTTP/2 multiplexing eliminates the connection-per-request overhead of HTTP/1.1. Together, this reduces both network and CPU costs.
- When asked about streaming: gRPC natively supports four patterns (unary, server streaming, client streaming, bidirectional) because HTTP/2 supports bidirectional data flow on a single stream. No WebSocket, SSE, or long-polling workarounds needed.
- When discussing deadline propagation: gRPC sends remaining deadline time as a header on every downstream call. If a client cancels or times out, RST_STREAM propagates cancellation to all downstream services, preventing wasted work. REST has no built-in equivalent.
- When asked about load balancing: gRPC's long-lived connections break L4 load balancers. Solutions: client-side load balancing (gRPC channel resolves multiple addresses), L7 proxy (Envoy), or a service mesh (Istio/Linkerd). Standard round-robin at the TCP level creates hot spots.
- When discussing schema evolution: Protobuf uses field numbers on the wire, not field names. Add new fields with new numbers (old clients ignore them). Never reuse a deleted field number. Never change a field's type. This enables independent client/server deployments.
- When asked about HTTP/2 vs HTTP/1.1: HTTP/2 multiplexes streams over one TCP connection (no head-of-line blocking at HTTP level), uses binary framing (lower parsing cost), compresses headers with HPACK. Limitation: TCP-level HOL blocking persists (QUIC/HTTP/3 fixes this).
- When asked about gRPC in the browser: Use grpc-web (requires Envoy proxy to translate) or Connect (dual-protocol framework that serves gRPC and REST/JSON from the same .proto definition). Native browser gRPC support does not exist because browsers do not expose HTTP/2 framing to JavaScript.
Quick recap
- gRPC combines HTTP/2 (multiplexed binary transport) with Protobuf (compact binary serialization) and code generation to create a strongly-typed, efficient RPC framework.
- HTTP/2 multiplexing allows hundreds of concurrent RPCs over a single TCP connection, eliminating the connection-per-request overhead and head-of-line blocking of HTTP/1.1.
- Protobuf encodes messages 2-5x smaller than JSON using field numbers instead of field names and varint encoding for integers, with built-in backwards compatibility via field number stability.
- Four RPC types (unary, server streaming, client streaming, bidirectional) provide native streaming without WebSocket or SSE workarounds, all with per-stream HTTP/2 flow control.
- Deadlines and cancellation propagate automatically across service boundaries via
grpc-timeoutheaders and HTTP/2 RST_STREAM frames, preventing wasted downstream work after client timeouts. - gRPC is the standard for internal microservice communication, but REST remains preferred for public APIs due to browser support, CDN caching, curl debuggability, and broader tooling ecosystem.
Related concepts
- Serialization formats - Protobuf's wire format is just one of many serialization approaches, understanding the tradeoffs vs JSON, MessagePack, Avro, and Thrift helps you choose the right format.
- HTTP keep-alive - Connection reuse in HTTP/1.1 is a simpler version of gRPC's long-lived channels, understanding keep-alive helps explain why gRPC goes further with HTTP/2 multiplexing.
- Backpressure - HTTP/2 flow control is a transport-level backpressure mechanism, but gRPC applications also need application-level backpressure for streaming RPCs.
- Connection pooling - gRPC channels are specialized connection pools, understanding general connection pooling helps you configure channel options correctly.