gRPC internals
How gRPC works under the hood: HTTP/2 multiplexing, Protobuf binary framing, the four RPC types (unary/server streaming/client streaming/bidirectional), connection management, and operational tradeoffs vs REST.
The problem
Your e-commerce platform has 40 microservices. Each service calls 3-5 others per request. The order service calls inventory, payment, shipping, and notification services, all via REST/JSON over HTTP/1.1. At 10,000 orders per second, the system makes 40,000-50,000 inter-service HTTP calls per second.
Each REST call sends a JSON payload (text, verbose, with repeated field names), opens a new TCP connection (or waits for one from a pool), serializes/deserializes through a text parser, and uses HTTP/1.1 which blocks behind previous requests on the same connection. The CPU overhead of JSON parsing alone consumes 15% of your service's compute. Connection management adds latency, and there is no schema enforcement: a field name typo deploys to production and fails at runtime.
This is the problem gRPC solves. It replaces text with binary, multiplexes hundreds of RPCs over a single connection, enforces a typed contract at compile time, and provides native streaming without websocket workarounds.
What it is
gRPC is a remote procedure call framework that combines HTTP/2 as the transport protocol with Protocol Buffers (Protobuf) as the serialization format, wrapped in a code-generation system that produces typed client and server stubs from a shared .proto schema.
Think of it like a phone call vs sending letters. REST is like sending letters: you write a message in English (JSON), put it in an envelope (HTTP/1.1), address it, and wait for a reply letter. gRPC is like a phone call: you dial once (establish an HTTP/2 connection), speak directly in a shared language (Protobuf binary), and both sides can talk simultaneously (streaming). The phone line stays open for multiple conversations.
How it works
A gRPC call flows through several layers: the application calls a generated stub, which serializes the request to Protobuf binary, wraps it in a gRPC frame (5-byte header with compression flag and message length), sends it over an HTTP/2 stream with HPACK-compressed headers, and the server reverses the process.
The RPC method name maps to an HTTP/2 path: /PackageName.ServiceName/MethodName. Headers are compressed using HPACK (HTTP/2 header compression), which is especially efficient for gRPC because consecutive calls to the same method share nearly identical headers.
// gRPC wire format for each message:
[1 byte] compressed flag (0 = no, 1 = yes)
[4 bytes] message length (big-endian uint32)
[N bytes] serialized protobuf payload
// Full HTTP/2 frame sequence for a unary RPC:
Client β Server: HEADERS frame (method, path, content-type)
Client β Server: DATA frame (gRPC message above)
Client β Server: END_STREAM
Server β Client: HEADERS frame (content-type)
Server β Client: DATA frame (gRPC response message)
Server β Client: HEADERS frame (grpc-status, grpc-message) + END_STREAM
HTTP/2 multiplexing
HTTP/1.1 has head-of-line blocking: requests on a connection must complete in order. If Request 1 takes 500ms, Request 2 waits even if its response is ready in 5ms. Browsers work around this by opening 6+ parallel TCP connections per host, which wastes memory, TCP slow start overhead, and TLS handshake time.
HTTP/2 multiplexes multiple independent streams over a single TCP connection. Each gRPC call gets its own stream ID. Frames from different streams interleave freely:
For gRPC, this means a single connection handles hundreds of concurrent RPCs. I've seen production gRPC channels sustain 5,000+ concurrent streams on a single TCP connection. The per-RPC overhead is tiny: a few bytes of framing instead of a new TCP handshake.
HTTP/2 also provides flow control per stream. If one RPC's client falls behind in reading a server-streaming response, HTTP/2 flow control pauses only that stream. Other streams continue unaffected. This is critical for gRPC streaming: a slow consumer on one stream does not block fast consumers on other streams.
HTTP/2 solves head-of-line blocking at the HTTP level, but TCP still has HOL blocking at the transport level. If a single TCP segment is lost, all streams on that connection stall until the segment is retransmitted. QUIC (HTTP/3) solves this by implementing streams at the transport layer, but gRPC over QUIC is still experimental.
Protobuf serialization
JSON is text. Every field name, every bracket, every quote character is serialized and transmitted. Protobuf replaces field names with small integer tags and uses binary encoding:
HTTP/1.1 REST + JSON:
POST /users HTTP/1.1
Content-Type: application/json
{"user_id": 12345, "name": "Alice", "email": "alice@example.com"}
(78 bytes, text, must be parsed character by character)
gRPC + Protobuf:
gRPC message header (5 bytes): [compressed flag][message length]
Protobuf payload:
field 1 (user_id, varint): \x08\xB9\x60 (3 bytes)
field 2 (name, string): \x12\x05Alice (7 bytes)
field 3 (email, string): \x1A\x13alice@example.com (21 bytes)
(36 bytes, binary, schema-validated)
Protobuf uses varint encoding for integers: small numbers (0-127) fit in a single byte. Field numbers (1, 2, 3) replace field names, so "user_id" never appears on the wire. The wire type (varint, length-delimited, fixed32, fixed64) is packed into the tag byte.
// Protobuf wire format for field 1 (user_id = 12345):
Tag byte: 0x08 = (field_number=1 << 3) | wire_type=0 (varint)
Value: 0xB9 0x60 = varint encoding of 12345
// Protobuf wire format for field 2 (name = "Alice"):
Tag byte: 0x12 = (field_number=2 << 3) | wire_type=2 (length-delimited)
Length: 0x05 = 5 bytes
Value: "Alice" (raw UTF-8 bytes)
The .proto schema is shared between client and server at build time:
syntax = "proto3";
message User {
int32 user_id = 1; // field number 1, varint encoding
string name = 2; // field number 2, length-delimited
string email = 3; // field number 3, length-delimited
}
Backwards compatibility is built into the wire format. New fields added to the schema are ignored by old clients (unknown tag numbers are skipped). Old fields removed from the schema default to zero values for new clients. This is why Protobuf uses field numbers instead of field names, as long as you never reuse a field number, evolution is safe.
The four streaming types
gRPC supports four RPC patterns, all built natively on HTTP/2 streams:
service UserService {
// Unary: one request, one response
rpc GetUser (GetUserRequest) returns (User);
// Server streaming: one request, stream of responses
rpc ListUsers (ListUsersRequest) returns (stream User);
// Client streaming: stream of requests, one response
rpc BulkCreateUsers (stream CreateUserRequest) returns (BulkCreateResponse);
// Bidirectional streaming: both sides stream independently
rpc ChatStream (stream ChatMessage) returns (stream ChatMessage);
}
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.