RPC vs. messaging
The architectural difference between synchronous RPC calls (gRPC, REST) and asynchronous messaging (Kafka, SQS): when each model fits, the failure modes, and why the choice affects your entire service topology.
TL;DR
| Dimension | Choose RPC | Choose Messaging |
|---|---|---|
| Coupling | Caller needs the result to proceed (checkout needs payment confirmation) | Caller doesn't need the result immediately (send notification, update analytics) |
| Failure mode | Prefer fast failure with immediate error feedback | Prefer buffered retry with eventual delivery |
| Latency | Need sub-50ms response, direct call to callee | Can tolerate seconds-to-minutes of processing delay |
| Fan-out | One caller, one callee, point-to-point | One event, many consumers (notifications + analytics + inventory) |
| Traffic shape | Steady, predictable throughput | Bursty traffic that needs buffering (flash sales, batch jobs) |
Default answer: RPC for the user-facing request/response path, messaging for side-effects and background work. Most production systems use both. The checkout call is RPC (you need to know if payment succeeded), but the post-purchase email, analytics event, and inventory update go through a message broker.
The Framing
A team I worked with built an order service that made synchronous RPC calls to five downstream services: payment, inventory, notifications, analytics, and fraud detection. Every order went through all five before the user saw "Order confirmed."
Then the analytics service deployed a bad query that added 800ms to every response. Suddenly the order endpoint went from 200ms to 1,000ms. Users started abandoning checkout. But the analytics data wasn't even user-visible. It was a background metric pipeline holding the checkout hostage.
The fix was straightforward: payment and inventory stayed as RPC calls (the order can't complete without them). Notifications, analytics, and fraud detection moved to a Kafka topic. The order service publishes an "order.placed" event and returns immediately after payment + inventory succeed. The three background services consume the event at their own pace.
Order latency dropped back to 200ms. When analytics deploys a bad query now, its consumer falls behind, messages queue up, and nobody notices until the dashboards are delayed. The checkout path is completely unaffected.
This pattern is the core of the tradeoff: RPC couples services in time (both must be available simultaneously), messaging decouples them (the broker absorbs timing differences). The question is which services belong on the critical path and which don't.
How Each Works
RPC: Synchronous Request/Response
RPC (Remote Procedure Call) makes a network call look like a local function call. Service A calls Service B, waits for the response, and continues. The caller blocks until the callee responds or times out.
# gRPC client calling a payment service
import grpc
from payment_pb2 import ChargeRequest
from payment_pb2_grpc import PaymentServiceStub
channel = grpc.insecure_channel("payment-service:50051")
client = PaymentServiceStub(channel)
# Blocks until response or timeout
response = client.Charge(
ChargeRequest(
order_id="ord_abc123",
amount_cents=4999,
currency="USD",
idempotency_key="idem_xyz789"
),
timeout=5.0 # 5 second timeout
)
if response.status == "SUCCESS":
proceed_with_order()
else:
handle_payment_failure(response.error)
The strength of RPC is immediate feedback. You know right now whether the payment succeeded. You can make a decision on the next line of code. The programming model is simple: call a function, get a result.
The weakness is temporal coupling. Both services must be up at the same time. If the callee is slow, the caller is slow. If the callee is down, the caller fails (or times out). In a chain of RPC calls (A calls B calls C calls D), slowness or failure at any point cascades back through every caller.
Two dominant RPC protocols exist:
| Feature | REST (HTTP/JSON) | gRPC (HTTP/2 + Protobuf) |
|---|---|---|
| Serialization | JSON, human-readable, ~10x larger | Protobuf binary, compact, ~10x smaller |
| Contract | OpenAPI spec (optional) | .proto file (required, code-generated) |
| Streaming | Not native (SSE, WebSocket for workarounds) | Bidirectional streaming built in |
| Latency | ~1-5ms serialization overhead | ~0.1-0.5ms serialization overhead |
| Browser support | Native | Requires grpc-web proxy |
| Tooling | curl, Postman, any HTTP client | Requires protoc, language-specific stubs |
My rule: REST for public APIs and browser-facing services. gRPC for internal service-to-service communication where latency and type safety matter.
Messaging: Asynchronous Fire-and-Forget
Messaging decouples the sender from the receiver with a broker in between. The sender publishes a message and moves on immediately. The broker stores the message durably. Consumers read messages at their own pace.
# Kafka producer: publish order event
from kafka import KafkaProducer
import json
producer = KafkaProducer(
bootstrap_servers=["kafka-1:9092", "kafka-2:9092"],
value_serializer=lambda v: json.dumps(v).encode("utf-8"),
acks="all", # Wait for all replicas to acknowledge
retries=3
)
# Fire and forget: returns immediately after broker ack
producer.send(
topic="order.events",
key=b"ord_abc123", # Partition by order ID for ordering
value={
"event_type": "order.placed",
"order_id": "ord_abc123",
"user_id": "usr_456",
"items": [{"sku": "WIDGET-1", "qty": 2}],
"total_cents": 4999,
"timestamp": "2025-03-15T14:30:00Z"
}
)
# Producer continues immediately, doesn't wait for consumers
The strength is decoupling in time and space. The producer doesn't know or care which consumers exist, how many there are, or whether they're currently running. If a consumer crashes, messages queue up and are delivered when it recovers. If traffic spikes 10x, the broker absorbs the burst while consumers process at a sustainable rate.
The weakness is delayed feedback. The producer doesn't know if the consumer successfully processed the message. Failures are visible minutes later (dead letter queue, consumer lag alerts), not immediately. Debugging is harder because the request and processing are separated in time.
Head-to-Head Comparison
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.