Ambassador pattern
How the ambassador proxy intercepts your service's outbound calls to handle retries, circuit breaking, mTLS, and observability without touching application code. Where it fits, how it differs from a full service mesh, and the real costs.
The problem
Your platform has 20 microservices. The order service calls the payment service, the inventory service, and the notification service. The payment service calls the fraud service and the ledger service. Each service is owned by a different team, each written in a different language.
Six months ago one engineer added retry logic to the order service: three retries, 100ms fixed delay. Another engineer copied that logic into the inventory service. A third engineer added circuit breaking to the payment service, but got the threshold wrong. The fraud service has no retry logic at all because the team "didn't get around to it." The notification service has retries but they are not idempotency-safe, so failures sometimes send duplicate emails.
Now you need to enforce a company-wide policy: all service calls must use exponential backoff with jitter, circuit break after 5 consecutive failures, inject a correlation ID header, and emit a trace span. To do this, you must change 20 codebases, in 6 languages, owned by 8 teams, deployed on different schedules. And the policy will drift again within weeks.
This is the problem the ambassador pattern solves.
What it is
The ambassador pattern places a dedicated proxy process in the same network namespace as your service. The service sends all outbound calls to the ambassador on localhost. The ambassador intercepts every call, applies cross-cutting logic (retries, circuit breaking, header injection, mTLS, metrics), and then forwards to the real destination.
Think of it like a mail room in a large office building. Every email an employee sends goes through the mail room first. The mail room stamps it, routes it to the right building on the delivery network, logs it, and handles bounced messages. The employee just addresses the envelope. The mail room handles everything else, consistently, for everyone.
The key insight: the application code changes once (point outbound calls at localhost instead of the real service address). Every policy change happens in the ambassador configuration. One config file, one deployment, 20 services updated simultaneously.
How it works
Every outbound call from the application process becomes a local loopback connection to the ambassador process. The ambassador maintains a connection pool to upstream services and applies the full request pipeline before forwarding.
The ambassador exposes a listener on a well-known port (Envoy uses 15001 for outbound). All outbound traffic is redirected to this port using an iptables rule or OS-level hook, so the application does not need to know the ambassador's address at all.
# How traffic interception works (Envoy/Istio injection)
# iptables rule added during pod startup:
iptables -t nat -A OUTPUT -p tcp ! --dport 15001 -j REDIRECT --to-port 15001
# All outbound TCP except port 15001 (to avoid infinite loop) is redirected
# to Envoy's listener at port 15001
# Envoy inspects the original destination using SO_ORIGINAL_DST socket option
# and routes to the intended upstream
Once inside the ambassador, the call flows through a pipeline of filters applied in order:
The application is completely unaware of this pipeline. From the application's perspective, it connects to localhost, makes an HTTP call, and gets a response. Whether that response involved a retry, a circuit open, or an mTLS handshake is invisible to the application code.
Retry logic internals
The ambassador implements exponential backoff with full jitter, which is meaningfully different from fixed-delay retry or exponential backoff without jitter.
Fixed delay retry is dangerous. If 1000 requests fail simultaneously (downstream restart), all 1000 retry after exactly 500ms - creating a thundering herd that hits the restarted service in one synchronized burst. Exponential backoff without jitter concentrates retries into a narrower window than you expect at high concurrency. Full jitter spreads retries uniformly across the window.
function retry_with_jitter(request, max_attempts):
attempt = 0
while attempt < max_attempts:
response = forward(request)
if response.status not in RETRYABLE_CODES:
return response
attempt++
if attempt == max_attempts:
return response // return last failure
cap = 30_000 // max backoff 30s in ms
base = 100 // base delay 100ms
// Full jitter: random between 0 and min(cap, base * 2^attempt)
sleep_ms = random(0, min(cap, base * pow(2, attempt)))
sleep(sleep_ms)
return last_response
// Why full jitter: at attempt=3, cap=min(30000, 100*8)=800ms
// With 1000 concurrent retriers, they spread uniformly across 0-800ms
// Not synchronized. No thundering herd.
Critically, the ambassador only retries on idempotent conditions. Retrying a failed POST /payment/charge could charge the user twice. The retry policy must be configured per route and per HTTP method. GET requests are safe to retry. POST requests require either a server-side idempotency key or must be explicitly marked as retry-safe.
The real-world rule: retries happen on 429 Too Many Requests, 503 Service Unavailable, and connection errors. They do not happen on 400 Bad Request, 401 Unauthorized, or 5xx errors from POST/PUT/PATCH without explicit idempotency configuration.
Circuit breaker state machine
The circuit breaker inside the ambassador tracks health per upstream cluster and toggles between three states. Once you understand these states, you understand why the ambassador is not just "retries with extra steps."
CLOSED is normal operation. Every request passes through. The ambassador counts failures in a rolling time window (for example, 5 failures in 10 seconds). When the threshold is crossed, the circuit opens.
OPEN means the upstream is considered unavailable. All requests fail immediately without touching the network. The application sees a 503 instantly rather than waiting for a 30-second TCP timeout. This is the critical property: OPEN state protects your thread pool. A slow downstream cannot hold 1000 threads waiting when the circuit is open.
HALF-OPEN is a probe state. After the sleep window (30 seconds), the ambassador lets exactly one request through. If it succeeds, the circuit closes again. If it fails, the circuit reopens and the sleep window resets.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.