Missing idempotency anti-pattern
Learn why operations without idempotency keys cause duplicate side effects on retry, how to design idempotent APIs, and why this matters most in payment and notification systems.
TL;DR
- An operation is idempotent if applying it multiple times produces the same result as applying it once.
PUT /users/123 { name: "Alice" }is idempotent.POST /payments { amount: 100 }without an idempotency key is not. - Without idempotency, retries cause duplicate side effects: money charged twice, emails sent three times, orders created in duplicate.
- The solution is an idempotency key: the client generates a unique token for each logical operation. The server uses it to detect and short-circuit duplicate requests.
- This pattern is critical anywhere retries happen: payment APIs, notification pipelines, job submissions, and any message consumer processing a queue.
- Store idempotency keys in PostgreSQL for financial operations (durability). Use Redis with TTL for lower-stakes deduplication (speed).
The Problem
It's 2:47 a.m. on Black Friday. Your mobile app calls POST /payments to charge a user $99. The server processes the payment, charges the credit card, and starts sending the response. The network drops. The mobile app never receives the 200 response.
Should the app retry? If the payment succeeded, retrying will charge the user $198. If the payment failed (the request never reached the server), not retrying means the user never completes their purchase. The app can't tell which happened. It has to guess.
I've been on the receiving end of this exact scenario. The on-call engineer sees a flood of "duplicate charge" support tickets. The fix requires correlating payment logs, issuing manual refunds, and sending apology emails. All because the mobile client retried a request the server had already processed.
If it retries without idempotency, you will eventually double-charge users. If it doesn't retry, you will eventually lose sales on valid network drops. At 100K payments per day with a 1% network failure rate, that's 1,000 ambiguous requests daily. Even if only 10% of those cause duplicates, that's 100 double-charges per day.
The fundamental problem: HTTP doesn't tell you whether the server processed your request. A timeout means "I don't know what happened," not "the request failed." Without an explicit deduplication mechanism, the client must choose between data loss and data duplication.
The failure math
Consider a payment API processing 100,000 requests per day with 3 retries per failure:
- Network failure rate (mobile): ~1% = 1,000 ambiguous requests/day
- Probability that the request actually succeeded before the drop: ~50%
- Expected duplicates without idempotency: ~500 double-charges per day
- With 3 retries, worst case: up to 1,500 extra charges per day
At $50 average transaction value, that's $25,000-$75,000 in erroneous charges daily. Even with immediate refunds, the customer support cost, trust damage, and chargeback fees make this a P0 incident.
Stripe (whose payments API is the gold standard for API design) solved this in 2013 with the Idempotency-Key header. The client generates a UUID for each payment attempt. If the same key arrives twice, Stripe returns the same response it returned the first time.
Why It Happens
Five individually-reasonable decisions compound into this anti-pattern. The pattern is always the same: each decision makes sense in isolation, but the combination creates a duplicate-side-effect time bomb.
-
"We'll add retry logic later." The payment endpoint ships without idempotency. When timeouts appear in production, someone wraps the call in a retry loop. Now the system retries, but the server can't tell a retry from a new request.
-
"The network is reliable enough." In development, requests succeed 99.99% of the time. In production with mobile clients on cellular networks, 1-2% of requests fail at the TCP level. At 100K payments per day, that's 1,000-2,000 potential double-charges daily.
-
"The database has unique constraints." True for some columns, but a unique constraint on
(user_id, amount)doesn't prevent a legitimate second purchase for the same amount. You need a unique constraint on a client-generated operation identifier. -
"HTTP is request-response, so it's safe." Developers assume that if the client didn't get a response, the server didn't process the request. But the failure can happen after processing, before the response reaches the client. I've seen this exact assumption cause a double-charge incident at 3 a.m. on a Sunday.
-
"We'll deduplicate on the consumer side." Queue consumers assume the queue delivers each message exactly once. Kafka and SQS both deliver at-least-once. Without consumer-level idempotency, every redelivery is a duplicate side effect.
How to Detect It
| Symptom | What It Means | How to Check |
|---|---|---|
| Duplicate records with identical data but different IDs | POST without deduplication | SELECT amount, user_id, COUNT(*) FROM payments GROUP BY amount, user_id, DATE(created_at) HAVING COUNT(*) > 1 |
| Customer complaints about double charges | Retry hit a non-idempotent endpoint | Search payment logs for same user + same amount within 60 seconds |
| Duplicate emails or notifications | Message consumer processed same event twice | Check notification logs for duplicate message_id values |
| Kafka consumer lag spikes followed by duplicate processing | Rebalance triggered redelivery | Monitor consumer group offsets and compare with side-effect logs |
| HTTP 5xx spikes correlated with duplicate DB rows | Retry storms hitting non-idempotent endpoints | Cross-reference error rate graphs with duplicate-detection queries |
Code smell: Any POST endpoint that creates a side effect (charge, email, job submission) without accepting a client-generated unique key.
Architecture smell: Retry middleware wrapping calls to services that don't support idempotency. I once found a retry decorator on a payment client that had been silently double-charging users for weeks.
Quick audit: Search your codebase for retry loops (retry, backoff, attempt) and check whether the target endpoint accepts an idempotency key. Any retry without an idempotency key is a duplicate-side-effect bug waiting to happen.
// BAD: retry wrapper on non-idempotent endpoint
const result = await retry(3, () => api.post('/payments', { amount: 99 }));
// GOOD: retry with idempotency key
const key = generateUUID();
const result = await retry(3, () =>
api.post('/payments', { amount: 99, idempotencyKey: key })
);
The Fix
The core idea: give every operation a unique identity so the server can tell "new request" from "retry of the same request."
Three fixes, depending on where in the stack you're working: client-generated keys for API endpoints, server-side dedup stores for persistence, and ON CONFLICT for message consumers.
Fix 1: Client-generated idempotency key
Generate a UUID for each logical operation before sending the request. Use the same UUID on all retries of that same logical operation. Generate a new UUID for a new operation.
The key insight: the client, not the server, generates the key. The client knows whether it's retrying the same operation or starting a new one. The server can't make that distinction without the key.
async function chargePayment(userId: string, amount: number): Promise<Receipt> {
const idempotencyKey = generateUUID(); // generate ONCE per user intent
for (let attempt = 0; attempt < 3; attempt++) {
try {
return await paymentApi.post('/payments', {
userId,
amount,
idempotencyKey // same key on every retry
});
} catch (e) {
if (e.status === 400) throw e; // bad request, don't retry
await sleep(exponentialBackoff(attempt));
}
}
}
Fix 2: Server-side deduplication store
The server checks the idempotency key against a persistent store before processing. If the key exists, return the cached response. If not, process the request and store the key + response atomically.
The idempotency table schema:
CREATE TABLE idempotency_keys (
key TEXT PRIMARY KEY,
response JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'completed', -- 'completed' or 'failed'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + INTERVAL '7 days'
);
-- Clean up expired keys daily
CREATE INDEX idx_idempotency_expires ON idempotency_keys (expires_at);
async function processPayment(req: PaymentRequest): Promise<Receipt> {
const existing = await db.idempotency.findByKey(req.idempotencyKey);
if (existing) {
// Duplicate request: return cached response
return existing.response;
}
// First time: process and store result
const receipt = await chargeCard(req);
await db.idempotency.save({
key: req.idempotencyKey,
response: receipt,
createdAt: new Date(),
expiresAt: addDays(new Date(), 7) // clean up after 7 days
});
return receipt;
}
The idempotency store requires a unique constraint on key. The insert and the payment charge should be in the same database transaction, or use the outbox pattern to ensure they're atomic.
Choosing a storage backend
| Backend | Latency | Durability | Best for |
|---|---|---|---|
| PostgreSQL (same DB as business data) | ~1-5ms | Full ACID | Payments, transfers, any financial operation |
| Redis with TTL | < 1ms | At-risk on restart | Notification dedup, API rate limiting |
| DynamoDB with TTL | ~5-10ms | Durable | Serverless architectures, Lambda idempotency |
For payment systems, co-locate the idempotency key in the same database as the business data. This lets you do an atomic transaction: insert the key and process the payment in one commit. If the transaction fails, neither the key nor the payment persists. If it succeeds, both persist. No partial states.
Fix 3: Kafka consumer idempotency
For message consumers, use INSERT ... ON CONFLICT to deduplicate at the processing level:
async function handleOrderEvent(event: OrderEvent): Promise<void> {
// Use event ID as natural idempotency key
const result = await db.query(
`INSERT INTO processed_events (event_id, processed_at)
VALUES ($1, NOW())
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id`,
[event.id]
);
if (result.rowCount === 0) {
// Already processed, skip
return;
}
// First time processing this event
await sendConfirmationEmail(event.orderId);
await updateOrderStatus(event.orderId, 'confirmed');
}
This works even when auto.commit causes redelivery after a rebalance. The database is the source of truth for what has been processed, not the consumer offset.
For high-throughput consumers, batch the inserts: check a batch of event IDs against the processed_events table in one query, then only process the ones that haven't been seen.
Here's what the fixed flow looks like:
Where idempotency matters
Anywhere retries exist, idempotency matters:
| System | Without idempotency | With idempotency |
|---|---|---|
| Payment API | User charged twice | Second charge rejected |
| Email notification | User receives 3 "welcome" emails | Exactly one email delivered |
| Job scheduler | Same job runs twice | Duplicate submission ignored |
| Kafka consumer | DB insert executed twice | ON CONFLICT DO NOTHING handles replay |
| Webhook delivery | Downstream called twice on retry | Downstream deduplicates on delivery ID |
The Kafka/message queue case is especially important. Consumer group rebalancing causes messages to be redelivered. Consumers must be idempotent at the processing level, not just at the queue level.
Choosing your idempotency strategy
Natural vs engineered idempotency
Some operations are naturally idempotent:
PUT /resources/{id}replaces the full resource with the same data. Same result regardless of repeat count.DELETE /resources/{id}removes the resource. Deleting twice is the same as deleting once.
Some require engineering to be idempotent:
POST /paymentsneeds an idempotency key.POST /emailsneeds a check for prior sends by message ID.- Increment counter operations need conditional updates or CRDTs.
Severity and Blast Radius
Severity: High. Duplicate side effects directly impact users and are difficult to reverse. A double charge requires a manual refund, customer support contact, and trust erosion. At scale, this becomes a compliance and legal risk.
Blast radius: Every retry path without idempotency is a potential duplicate. The impact varies by domain:
| Domain | Duplicate impact | Reversibility |
|---|---|---|
| Payments | Financial loss, user trust damage | Moderate (manual refund) |
| Notifications | User annoyance, spam complaints | Irreversible (email already sent) |
| Job scheduling | Wasted compute, data corruption | Hard (output may be consumed) |
| Webhooks | Downstream confusion, double processing | Depends on downstream idempotency |
| Inventory | Overselling, stock discrepancy | Hard (physical goods may ship) |
Recovery difficulty: Moderate to hard. Detecting duplicates after the fact requires log correlation across multiple systems. Reversing side effects (refunds, notification retractions) is operationally expensive. Some side effects (sent emails, triggered webhooks, shipped packages) cannot be reversed at all.
The worst part: without idempotency instrumentation, you may not even know duplicates are happening until a customer complains.
When It's Actually OK
- Read-only operations. GET requests are naturally idempotent. No key needed.
- Naturally idempotent writes.
PUT /users/123 { name: "Alice" }sets the resource to the same state regardless of how many times you call it.DELETE /resources/123is also naturally idempotent (the second delete is a no-op). - Low-stakes fire-and-forget operations. Logging, analytics events, and telemetry data can tolerate some duplication without business impact. Your analytics pipeline probably already deduplicates.
- Internal services with exactly-once producer guarantees. Kafka with
enable.idempotence=truehandles producer retries. But consumer processing still needs idempotency since rebalancing triggers redelivery. - Prototyping and early-stage products. If you have 100 users and no payment processing, idempotency keys are premature optimization. Add them when you add retries or when the stakes increase.
The bottom line: if the operation has side effects, retries can happen, and the stakes are non-trivial, you need idempotency. When in doubt, add the key. The overhead is one UUID per request and one database lookup.
How This Shows Up in Interviews
In any design with payments, notifications, or queued message processing, explicitly address idempotency. A strong answer sounds like: "All retry paths carry an idempotency key. The server checks for a prior execution with that key before processing. This prevents duplicate charges, duplicate notifications, and duplicate job executions."
Interviewers look for three signals:
- You identify which operations have side effects and which are naturally idempotent
- You explain where the key is generated (client) and where it's checked (server)
- You mention atomic storage of the key with the business operation
This is especially important in payment system designs, notification systems, and any architecture that uses message queues. This is a material signal that you understand the reliability requirements of real distributed systems. I've seen candidates instantly level up an answer by saying "and the payment endpoint accepts an idempotency key" during a system design round.
Without idempotency, your retry logic is your bug
Adding retries to a non-idempotent operation doesn't improve reliability. It trades one failure mode (no retry on drop) for another (duplicate side effect on retry). Both are bugs. Idempotency is what makes retries safe.
Quick Recap
- Non-idempotent operations + retries = duplicate side effects. This is a design bug, not a runtime edge case.
- The idempotency key pattern: the client generates a UUID per logical operation, the server stores the key and returns the cached response on duplicates.
- The idempotency store needs a unique constraint on the key and should be atomically updated with the operation it protects.
- Kafka consumers must be idempotent because messages are redelivered on rebalance. Use
INSERT ... ON CONFLICT DO NOTHINGat the processing layer. - Naturally idempotent operations (PUT, DELETE) don't need keys. POST endpoints that create side effects always do.
- Store idempotency keys in PostgreSQL for financial operations (durability) and Redis for lower-stakes deduplication (speed).
- The key is generated by the client because only the client knows whether this is a new operation or a retry of a previous one.