Stripe: Building for idempotency
How Stripe's idempotency key design prevents duplicate charges, handles network failures gracefully, and provides a blueprint for safe financial API design.
TL;DR
- Network timeouts between client and server create an impossible question: "Did my request succeed?" Without idempotency, retrying a payment request risks charging the customer twice.
- Stripe requires callers to provide a client-generated
Idempotency-Keyheader on every mutating API call. The server stores the key alongside the result, replaying the stored response on any retry. - The design uses an atomic
findOrCreateoperation against a unique-indexedidempotency_keystable, with apendingintermediate state to handle concurrent retries safely. - Keys are scoped per API key, validated against a request body hash, and expired after 24 hours.
- This pattern has become the industry standard for payment APIs and applies to any operation that must execute at-most-once: email sends, webhook deliveries, ledger entries.
The Trigger
Every payment API eventually faces the same failure mode. A client sends a POST /v1/charges request. The server processes it, debits the customer's card, and begins writing the response. Then the TCP connection drops. The client never receives a response.
From the client's perspective, there are two possibilities: the charge succeeded (and the response was lost in transit) or the charge never reached Stripe's processing pipeline. The client has no way to distinguish between these cases. The only safe option is to retry.
Without idempotency, that retry creates a second charge. The customer sees two $50.00 debits on their statement. Support tickets pile up. Trust erodes. I've worked with payment integrations where this exact scenario caused thousands of dollars in duplicate charges during a single network blip, and the engineering team spent weeks reconciling the mess.
At Stripe's scale (processing hundreds of billions of dollars annually across millions of API calls per minute), network timeouts are not edge cases. They are a constant. Load balancer connection resets, client-side timeout configurations, TLS handshake failures, and cloud provider network partitions all produce the same result: a request that may or may not have succeeded.
Double-charging is the worst UX failure
Users forgive slow pages. They forgive occasional errors. They do not forgive being charged twice. A single duplicate charge triggers a support ticket, a potential chargeback, and permanent distrust of the platform. For payment APIs, idempotency is not a nice-to-have. It is a correctness requirement.
This diagram is not theoretical. Before Stripe's idempotency system, this failure mode was a real and frequent source of customer complaints across the payments industry. The entire pattern exists because of this one scenario.
The System Before
Payment APIs in the early 2010s handled retries in one of two ways, both flawed.
Approach 1: No retry safety. The API is stateless. Every request executes independently. If a client retries, a duplicate charge happens. The burden falls on the client to implement their own deduplication, which most clients get wrong.
Approach 2: Server-generated transaction IDs. The server returns a transaction ID after processing. The client can use this ID to check status before retrying. The problem: if the original request's response was lost, the client never received the transaction ID. You are back to square one.
The fundamental flaw in both approaches is the same: the deduplication signal does not exist before the first request. Any solution that requires the server to generate the deduplication key cannot handle the case where the server's response is lost. This is the insight that drove Stripe's design.
How other companies handled it
Stripe was not the first to face this problem, but most existing solutions had significant limitations.
| Provider | Retry approach | Limitation |
|---|---|---|
| PayPal (early API) | Server-generated txn_id on response | Useless if the response is lost |
| Braintree | Client submits order_id, dedup on match | Breaks for recurring charges with same order |
| Early Stripe (pre-2015) | No deduplication | Duplicate charges on retry |
| Amazon Pay | Request-level ReferenceId | Close to Stripe's eventual design, but scoping was more restrictive |
The pattern Stripe settled on, client-generated keys with server-side storage, was not entirely novel. But their implementation of the pending state, request hash validation, and automatic SDK integration set the standard that the rest of the industry adopted.
Why Not Just Use Database Unique Constraints?
The obvious first thought: "Why not deduplicate on the server using business fields?" For example, reject a second charge if (customer_id, amount, currency, timestamp) matches a recent charge within a window.
This breaks down immediately in practice.
A customer legitimately orders two items at the same price within seconds. A subscription service charges the same amount monthly. A marketplace splits a payment into identical sub-charges. Business-field deduplication cannot distinguish a legitimate duplicate from a retry of a failed request.
I've seen teams try fuzzy time-window deduplication ("reject charges with the same amount within 60 seconds"), and it creates a different nightmare: legitimate charges get rejected, and the window tuning becomes an endless game of whack-a-mole.
Server-generated request IDs (like a UUID assigned on receipt) also fail. If the server assigns a request ID and the response carrying that ID is lost, the client cannot reference it during retry. The client needs to own the deduplication key before the first request ever leaves the client.
The bottom line: any deduplication mechanism that relies on server-side state created during request processing cannot solve the "lost response" problem. The key must originate on the client.
The Decision
Stripe's design centers on a single principle: the client generates a unique key before making the request, and sends it as an Idempotency-Key header. The server guarantees that any request with a previously seen key returns the original result without re-executing.
POST /v1/charges HTTP/1.1
Authorization: Bearer sk_live_xxx
Idempotency-Key: order_prod_8f14e45f-ceea-4a3b-9b97
{
"amount": 5000,
"currency": "usd",
"customer": "cus_NhD8HD2bY8dP3V",
"description": "Order #8f14e"
}
Three design choices make this work:
1. Client-generated keys. The client creates the key (typically a UUID or a business-meaningful identifier like order_id) before sending the request. This means the key exists even if the server never receives the first attempt. On retry, the client sends the same key, and the server can match it.
2. Key scoping per API key. Idempotency keys are namespaced to the caller's API key. Two different merchants can independently use the same UUID as an idempotency key without conflict. This is critical for a multi-tenant platform like Stripe.
3. Request body hash matching. Stripe hashes the request body and stores it alongside the idempotency key. If a client retries with the same key but a different request body (different amount, different customer), Stripe returns a 400 Idempotency Key Conflict error. This prevents accidental key reuse across different operations.
Why client-generated beats server-generated
The entire purpose of idempotency keys is to handle the case where the server's response is lost. If the server generates the key, the key lives in the response. If the response is lost, the key is lost. Client-generated keys break this circular dependency because the key exists before the request is sent.
4. Key expiration at 24 hours. Stripe retains idempotency records for 24 hours. After that, the same key executes a new operation. This is a deliberate boundary: a retry within 24 hours is "the same attempt." After 24 hours, it is a new business intent. The expiration also keeps the idempotency store from growing without bound.
For your interview: when designing any payment API, say "the client generates an Idempotency-Key, the server stores it with the result, and replays the stored response on retry." That sentence alone demonstrates you understand the core pattern.
The Migration Path
Stripe did not introduce idempotency as a big-bang change. The rollout followed a deliberate sequence that minimized risk to live payment processing.
Phase 1: Schema and infrastructure. The team created the idempotency_keys table with a unique index on the key column, scoped by API key. The table was deployed to production but initially unused by the API layer. Rollback plan: drop the table.
Phase 2: Read path (shadow mode). The API began checking for existing idempotency keys on incoming requests but did not enforce the behavior. If a key matched, the system logged the match and continued processing normally. This validated that key lookups were fast and the index performed under production load.
Phase 3: Write path activation. The API began writing idempotency records on completion of mutating requests. At this point, new requests would find existing keys from previous attempts, but the replay behavior was gated behind a feature flag. The team monitored for key collision rates and storage growth.
Phase 4: Replay enforcement. The feature flag was enabled, and the full idempotency guarantee went live. Retried requests with matching keys now returned stored responses without re-executing. This was rolled out API-key by API-key, starting with internal test accounts, then beta partners, then general availability.
Phase 5: Documentation and SDK integration. Stripe's client libraries were updated to automatically generate and send idempotency keys for all POST requests. The API documentation was updated to strongly recommend (and later require) keys for mutating operations.
Interview tip: phased rollout for correctness-critical changes
Mention that Stripe rolled this out in phases with shadow mode and feature flags. Interviewers love hearing about safe deployment strategies for changes that affect financial correctness. "We shadow-tested under production load before enforcing" is a strong signal.
The entire rollout took approximately three months, with most of that time spent in shadow mode gathering confidence in key lookup performance and collision rates.
The System After
The idempotency system sits as a middleware layer between Stripe's API gateway and the charge processing logic. Every mutating request passes through it before reaching any payment logic.
The schema
-- Stripe's idempotency_keys table (simplified)
CREATE TABLE idempotency_keys (
id BIGSERIAL PRIMARY KEY,
api_key_id BIGINT NOT NULL, -- scopes key to the merchant
key VARCHAR(255) NOT NULL, -- client-provided idempotency key
request_hash VARCHAR(64) NOT NULL, -- SHA-256 of the request body
request_path VARCHAR(255) NOT NULL, -- e.g., /v1/charges
response_code INT, -- stored HTTP status code
response_body JSONB, -- stored response payload
status VARCHAR(20) NOT NULL -- 'pending' or 'completed'
DEFAULT 'pending',
created_at TIMESTAMP NOT NULL
DEFAULT NOW(),
UNIQUE (api_key_id, key) -- the critical unique constraint
);
-- Index for expiration cleanup
CREATE INDEX idx_idempotency_keys_created_at
ON idempotency_keys (created_at);
The (api_key_id, key) unique index is the core of the entire system. It makes the findOrCreate operation atomic at the database level, guaranteeing that only one request per key ever reaches the processing stage.
The full request lifecycle
The implementation flow
// Simplified Stripe idempotency middleware
async function idempotencyMiddleware(
req: ApiRequest,
next: () => Promise<ApiResponse>
): Promise<ApiResponse> {
const idempotencyKey = req.headers['idempotency-key'];
if (!idempotencyKey) return next(); // GET requests, no key needed
const requestHash = sha256(JSON.stringify(req.body));
const apiKeyId = req.auth.apiKeyId;
// Atomic findOrCreate using UPSERT
const record = await db.query(`
INSERT INTO idempotency_keys (api_key_id, key, request_hash, request_path)
VALUES ($1, $2, $3, $4)
ON CONFLICT (api_key_id, key) DO NOTHING
RETURNING *
`, [apiKeyId, idempotencyKey, requestHash, req.path]);
// If INSERT returned nothing, the key already exists
if (!record.rows.length) {
const existing = await db.query(
'SELECT * FROM idempotency_keys WHERE api_key_id = $1 AND key = $2',
[apiKeyId, idempotencyKey]
);
const entry = existing.rows[0];
// Verify request body matches
if (entry.request_hash !== requestHash) {
return { status: 400, body: { error: 'Idempotency key conflict' } };
}
// If still pending, another request is in flight
if (entry.status === 'pending') {
return { status: 409, body: { error: 'Request in progress, retry later' } };
}
// Replay the stored response
return { status: entry.response_code, body: entry.response_body };
}
// First time seeing this key: execute the actual operation
try {
const response = await next();
await db.query(`
UPDATE idempotency_keys
SET status = 'completed', response_code = $1, response_body = $2
WHERE api_key_id = $3 AND key = $4
`, [response.status, response.body, apiKeyId, idempotencyKey]);
return response;
} catch (err) {
// On failure, delete the pending record so retries can try again
await db.query(
'DELETE FROM idempotency_keys WHERE api_key_id = $1 AND key = $2',
[apiKeyId, idempotencyKey]
);
throw err;
}
}
The pending state deserves special attention. When a request is being processed (card network call in flight), the idempotency record exists but has status = 'pending'. If a concurrent retry arrives during this window, Stripe returns a 409 Conflict rather than blocking. The client's retry logic waits and tries again, at which point the original request has completed and the stored response is available.
I've seen teams skip the pending state and just use "exists or not" logic. This creates a race condition: two concurrent requests with the same key both fail the lookup, both proceed to process, and you get the exact duplicate you were trying to prevent. The pending state is the lock that prevents this.
The idempotency key state machine
The cleanup job runs hourly, deleting records older than 24 hours. At Stripe's scale, this means millions of records expire daily, keeping the table size bounded and index performance consistent.
Beyond payments: where this pattern applies
Idempotency keys are not payment-specific. Any operation that has side effects, is called over a network, and should execute at-most-once benefits from this pattern.
Email sending. A notification service that retries on timeout without idempotency sends the same email twice. Users receive duplicate "Your order shipped" emails. Client-generated message IDs with server-side dedup solve this.
Webhook delivery. When Stripe itself delivers webhooks to merchant endpoints, it uses a similar pattern in reverse. Each webhook event has a unique event_id. The merchant's handler should store processed event IDs and skip duplicates.
Ledger entries. Any double-entry bookkeeping system (accounting, inventory, credits) cannot tolerate duplicate writes. An idempotency key per logical transaction prevents a retry from creating a second debit or credit.
Async job enqueuing. A web request that enqueues a background job (send report, generate PDF) can use idempotency keys to prevent duplicate job creation when the HTTP response is lost.
The pattern generalizes: anywhere you have "fire, lose the response, retry," you need client-generated dedup keys.
The Results
| Metric | Before Idempotency | After Idempotency |
|---|---|---|
| Duplicate charge rate | ~0.1% of retried requests | 0% (by design) |
| Support tickets for double-charges | Hundreds per month | Near zero |
| Client retry safety | Client must implement own dedup | Built into API contract |
| SDK complexity for retries | Custom per-client logic | Automatic in all Stripe SDKs |
| Average dispute/chargeback from duplicates | ~$50K/month industry-wide | Eliminated for Stripe users |
| API response overhead for idempotent replay | N/A (no replay existed) | < 5ms (single indexed lookup) |
The financial impact was significant. Duplicate charges generate chargebacks, which cost $15-25 each in processing fees regardless of outcome. For a platform processing millions of transactions daily, even a 0.01% duplicate rate adds up to substantial losses and operational overhead.
Beyond the direct metrics, the developer experience improvement was transformative. Before idempotency keys, every Stripe integration needed custom retry logic with exponential backoff and manual deduplication. After, Stripe's SDKs handle retries transparently. A developer can write stripe.charges.create(params) and know that network failures are handled automatically.
My recommendation for any API designer: if your API has side effects, idempotency keys are not optional. They are table stakes.
Industry adoption after Stripe
Stripe's design became the blueprint. Within a few years, the Idempotency-Key header pattern spread across the payments industry and beyond.
| Company | Idempotency mechanism | Key differences from Stripe |
|---|---|---|
| Stripe | Idempotency-Key header, 24h expiry | The original. Client-generated, hash-validated |
| Square | idempotency_key in request body | Body-level instead of header, same semantics |
| Adyen | reference field on payment request | Business-key style, no separate idempotency header |
| Amazon Pay | ReferenceId per operation | Scoped per operation type, not per API key |
| Google Pay API | Server-side dedup on transactionId | Server assigns ID, requires two-step flow |
The header-based, client-generated approach that Stripe pioneered remains the cleanest design. Google's two-step flow adds latency and complexity. Adyen's business-key approach conflates the idempotency concern with the domain model. Stripe kept them separate, which is the right call.
What They'd Do Differently
Stripe's engineering team has spoken publicly about lessons learned from the idempotency system.
Require keys from day one. Initially, idempotency keys were optional. Some early integrators never adopted them, and their customers bore the consequences. If Stripe could go back, they would make the header mandatory for all POST endpoints from the initial API launch.
Shorter initial expiration window. The 24-hour window was generous and created storage pressure. A 1-hour window would have been sufficient for retry scenarios while reducing table size by 24x. They kept 24 hours because changing it would break existing client assumptions, a classic backward-compatibility tax.
Standardize key format. Stripe accepts any string up to 255 characters as an idempotency key. In practice, some clients used sequential integers, some used UUIDs, some used human-readable strings. A stricter format (UUID-v4 only) would have prevented collisions from poorly-chosen keys and made debugging easier. They could not enforce this retroactively without breaking existing integrations.
These are all backward-compatibility tradeoffs. Get the API contract right the first time, because you will live with it for years.
I've made the "optional safety" mistake myself on internal APIs. By the time you realize it should have been mandatory, half your consumers have built retry logic without it, and making it required becomes a breaking change. Default to mandatory.
Architecture Decision Guide
If your API charges money, transfers funds, or modifies inventory, implement the full pattern: client-generated keys, request hash validation, pending state, and SDK-level auto-retry. Everything else is cutting corners on correctness.
Transferable Lessons
1. The client must own the deduplication key. Any deduplication mechanism where the server generates the identifier fails when the server's response is lost. Stripe's core insight was recognizing that the key must exist before the request is sent. This principle applies to any distributed system with at-most-once semantics: message queues, event processors, webhook deliveries. If the consumer generates the dedup key, retries are safe by construction.
2. "Pending" is a state, not an absence.
The difference between "no record exists" and "a record exists but processing is incomplete" is critical. Stripe's pending state acts as a distributed lock, preventing concurrent execution of the same logical operation. I've seen teams treat "key not found" and "key found but incomplete" identically, and it always leads to race-condition bugs under load. Model the intermediate state explicitly.
3. Validate intent, not just identity. Storing a hash of the request body alongside the idempotency key prevents a subtle class of bugs: accidental key reuse across different operations. Without hash validation, a client that reuses a key for a different charge would silently receive the wrong cached response. Intent validation (same key + same request = safe replay, same key + different request = error) is a pattern worth copying in any idempotent API.
4. Expiration is a design decision, not a cleanup detail. Stripe's 24-hour expiration defines the boundary between "retry" and "new operation." This is a product decision disguised as an infrastructure choice. Too short, and legitimate retries fail. Too long, and storage grows unbounded. Pick the window based on your longest reasonable retry scenario, then document it in your API contract.
5. Make correctness the default, not an opt-in. Stripe's biggest regret was making idempotency keys optional initially. When safety is opt-in, some users will opt out (intentionally or through ignorance) and bear the consequences. Build the safety mechanism into the SDK so developers get correct behavior without thinking about it. This principle extends far beyond idempotency: encryption, input validation, rate limiting. If users have to remember to enable safety, some of them will not.
How This Shows Up in Interviews
Any system design involving payments, e-commerce, or financial transactions will expect you to address idempotency. Here is how to use this knowledge.
The sentence: "Every mutating endpoint accepts a client-generated Idempotency-Key. The server stores the key with the response and replays it on retry, using a request hash to prevent key misuse."
| Interviewer asks | Strong answer citing Stripe's approach |
|---|---|
| "How do you handle duplicate payments?" | "Client-generated idempotency keys, stored server-side with the result. Same key returns cached response. This is how Stripe handles it at scale." |
| "What if the client retries and the first request is still processing?" | "The idempotency record has a 'pending' state. Concurrent retries get a 409 response and retry after a short backoff, like Stripe's design." |
| "Why not deduplicate by amount + timestamp?" | "Legitimate duplicates exist (same amount, same customer, seconds apart). Only a client-generated key can distinguish retry from new intent." |
| "How do you prevent idempotency key abuse?" | "Hash the request body and store it with the key. If the body changes but the key is reused, return a 400 conflict. Scope keys per API key for multi-tenancy." |
| "What about key storage growing forever?" | "Expire keys after 24 hours with a background cleanup job. Stripe does this. It bounds storage while covering all reasonable retry windows." |
Interview tip: name the table schema
Mentioning the idempotency_keys table with columns like key, request_hash, response_body, and status shows you understand the implementation, not just the concept. Sketch it on the whiteboard when discussing API design.
Quick Recap
- Network failures create an unavoidable "did it succeed?" problem for any API with side effects, and retrying without idempotency causes duplicate execution.
- Stripe solves this with client-generated idempotency keys: the client owns the dedup key before the request is sent, making retries safe even when the server's response is lost.
- The server stores each key in an
idempotency_keystable with a unique index, using an atomicfindOrCreate(UPSERT) to guarantee at-most-once execution. - A
pendingintermediate state prevents concurrent retries from both executing the operation, acting as a distributed lock during processing. - Request body hash validation prevents accidental key reuse across different operations, catching misuse before it causes silent data corruption.
- Keys expire after 24 hours, bounding storage growth and defining the semantic boundary between "retry" and "new operation."
- The transferable principle: in any distributed system with at-most-once requirements, the deduplication key must originate on the client side, and the intermediate "in-progress" state must be modeled explicitly.
Related Concepts
- Missing idempotency covers the anti-pattern of building APIs without idempotency and the failure modes that result, exactly what Stripe's design prevents.
- Outbox pattern solves a complementary problem: ensuring that a database write and an event publication happen atomically, often used alongside idempotency keys in payment systems.
- Two-phase commit is the traditional distributed transaction protocol that idempotency keys sidestep by shifting the deduplication responsibility to the client rather than coordinating across services.