Wire Transfer
Walk through a complete wire transfer design, from a naive single-database transaction to a production-grade distributed payment system that handles concurrent transfers, idempotent retries, and thousands of business payments per second.
What is a wire transfer API?
A wire transfer moves money from one bank account to another, directly and irrevocably. Unlike card payments, there is no chargeback network to catch mistakes. The engineering challenge sounds deceptively simple: debit one account, credit another, never do it twice. What makes it hard is that "never do it twice" and "always do both" must hold even when services crash mid-transfer, clients retry after a network timeout, and thousands of business payments target the same account simultaneously.
This question tests your grasp of distributed transactions, idempotency, optimistic vs pessimistic concurrency, and the design tradeoffs that separate a prototype from a system banks and payment processors actually trust.
Functional Requirements
Core Requirements
- Initiate a transfer from a source account to a destination account.
- Debit and credit atomically -- partial execution must never occur.
- Return a transfer ID for status tracking.
- Support transfer status queries and idempotent retries.
Below the Line (out of scope)
- FX conversion and multi-currency. Currency conversion touches regulatory compliance and interbank rate APIs -- a separate design. To add it later, we would insert a currency conversion step before the debit, locking the exchange rate atomically with the reservation.
- Fraud and compliance checks. OFAC screening, velocity limits, and AML checks belong in a separate fraud service invoked synchronously before we touch the ledger. We call out the integration point but do not design its internals.
- User authentication and account management. We assume the caller is authenticated and the account IDs are already verified. A real system would validate JWT claims and check account ownership before any ledger operation.
The hardest part in scope: Ensuring a transfer is applied exactly once even when the client retries (idempotency) and preventing concurrent debits from overdrafting the same account (concurrency control). Every design decision in this article traces back to these two constraints.
Non-Functional Requirements
Core Requirements
- Atomicity: A transfer must fully complete or fully roll back. No partial state where the source is debited but the destination is not credited. This eliminates any eventual-consistency approach for the debit-credit pair.
- Exactly-once execution: A transfer initiated with the same idempotency key must be applied at most once, regardless of how many times the client retries. This is the hardest constraint to satisfy in a distributed system.
- Latency: Transfer initiation returns a response within 500ms p99. The synchronous path (debit + credit + ledger write) must fit inside this window for same-bank transfers.
- Throughput: 5,000 transfers per second peak. Business payroll runs and batch settlements create short, intense write bursts that dwarf average load.
- Availability: 99.99% uptime (roughly 52 minutes of downtime per year). Availability is prioritised over strict read consistency for status checks, because transfer status only moves forward and a slightly stale "PENDING" read is harmless.
- Durability: Once a transfer is confirmed as COMPLETED, it must survive any single node failure including the database primary.
Below the Line
- Sub-100ms latency for cross-bank transfers (bank networks add their own 100ms+ overhead).
- Global multi-region active-active writes with sub-10ms cross-region replication lag.
Read/write ratio: Transfer initiation (write) and status polling (read) are roughly 1:3. This is not a read-heavy system. Status reads are cacheable and tolerate replica lag. The dominant challenge is write correctness, not read scale. Design decisions prioritise serialisability and durability over throughput optimisations like denormalisation or aggressive caching.
Core Entities
- Account: A ledger account with an owner, a current balance checkpoint, and a currency. The primary entity that transfers debit and credit.
- Transfer: The record of a single money movement -- source account, destination account, amount, currency, status (PENDING, PROCESSING, COMPLETED, FAILED), and a link to the idempotency key that initiated it.
- LedgerEntry: An immutable debit or credit line associated with a transfer. Every transfer produces exactly two entries: one debit from the source, one credit to the destination. Never updated, only inserted.
- IdempotencyRecord: Maps a client-supplied idempotency key (plus caller identity) to a transfer ID and the original response body. Prevents duplicate execution on retry.
Schema details (column types, indexes, constraints) are deferred to the Data Model deep dive. The key relationship: Transfer links to two LedgerEntry rows, and the Account row holds a running balance checkpoint derived from the append-only ledger entries.
API Design
Start with one endpoint per functional requirement, then evolve where the naive shape breaks down.
FR 1 -- Initiate a transfer (naive shape):
POST /transfers
Body: { source_account_id, destination_account_id, amount, currency }
Response: { transfer_id, status }
This works but has a fatal flaw: if the client's TCP connection drops after the debit executes but before the response arrives, the client cannot know whether the transfer happened. A retry would debit the source a second time.
FR 1 -- Initiate a transfer (evolved shape with idempotency):
POST /transfers
Headers: Idempotency-Key: <client-generated UUID>
Body: { source_account_id, destination_account_id, amount, currency }
Response: { transfer_id, status, created_at }
The Idempotency-Key is client-generated (a UUID the caller creates before the first attempt and reuses on every retry). If the server has already seen this key, it returns the original response without re-executing. This is the standard pattern used by Stripe, Adyen, and every serious payment API.
FR 2 -- Query transfer status:
GET /transfers/{transfer_id}
Response: { transfer_id, source_account_id, destination_account_id, amount, currency, status, created_at, completed_at }
GET because this is a pure read with no side effects. Both created_at and completed_at are returned so clients can distinguish "still processing" from "timed out and stuck". Status is one of PENDING, PROCESSING, COMPLETED, FAILED.
FR 3 -- List transfers for an account:
GET /accounts/{account_id}/transfers?limit=50&cursor=<opaque>
Response: { transfers: [...], next_cursor: "..." }
Cursor-based pagination because transfers are time-ordered and the dataset grows without bound. Offset pagination breaks when new transfers land between page fetches, producing duplicate or missing rows at page boundaries.
Authentication is out of scope for this design, but in production every endpoint requires a bearer token. The Transfer Service validates that the authenticated caller owns source_account_id before touching the ledger. Accepting arbitrary account IDs without ownership verification is an authorization vulnerability.
High-Level Design
We build the system incrementally, one requirement at a time, adding components only when the next requirement demands them.
1. Initiate a transfer (naive path)
The simplest correct design: one server, one database, one transaction wrapping both the debit and credit.
Components:
- Client: Any caller initiating a transfer via the REST API.
- Transfer Service: Validates the request, opens a DB transaction, debits source, credits destination, writes the transfer record.
- Accounts DB: Single PostgreSQL instance. Stores account balances and the transfers table. ACID transactions give us atomicity for free.
Request walkthrough:
- Client sends
POST /transferswith source, destination, amount, and idempotency key. - Transfer Service opens a database transaction.
- Read source account balance and check for sufficient funds.
UPDATE accounts SET balance = balance - amount WHERE id = source_account_id.UPDATE accounts SET balance = balance + amount WHERE id = destination_account_id.INSERT INTO transfers (...)with status = COMPLETED.- Commit. Return
{ transfer_id, status: COMPLETED }to client.
This is the happy path. The single-transaction approach is correct and simple. Problems emerge when we add the idempotency and concurrency requirements.
2. Idempotency -- handling retries safely
Clients retry after network timeouts. Without an idempotency check, each retry debits the source again.
The failure scenario: Client sends POST /transfers with key abc-123. The transfer succeeds, but TCP drops before the response arrives. Client retries with the same key. Without protection, the source account is debited twice.
The fix is an idempotency table checked before executing any transfer logic. The check must be atomic with the execution -- more on that in the deep dives.
Components added:
- IdempotencyRecord table: Maps
(idempotency_key, caller_id)to(transfer_id, response_body). Unique index on(idempotency_key, caller_id). Added to the same PostgreSQL instance for now.
Updated request walkthrough:
- Client sends
POST /transferswithIdempotency-Key: abc-123. - Transfer Service checks: does a row exist for
(abc-123, caller_id)? - If yes: Return the stored response. No transfer execution. No ledger writes.
- If no: Execute the transfer transaction. On commit, write the idempotency record atomically in the same transaction.
The idempotency check and transfer execution must be in the same transaction. If you check the key in one transaction and execute in a second, two concurrent requests with the same key can both pass the check and both execute. Use INSERT ... ON CONFLICT DO NOTHING to make the key reservation atomic with the transfer. This is the TOCTOU (time-of-check to time-of-use) race condition interviewers probe for.
3. Concurrent transfers -- preventing overdrafts
Two transfers from the same account running simultaneously can both read the same balance, both decide "sufficient funds", and both debit, producing a negative balance.
The race condition:
T1: Transfer A reads source balance = $500. Needs $400. Proceeds.
T2: Transfer B reads source balance = $500. Needs $300. Proceeds.
T3: Transfer A commits: balance = $100.
T4: Transfer B commits: balance = -$200. OVERDRAFT.
Both reads happen at READ COMMITTED isolation (PostgreSQL default) before either write. Postgres's default isolation level does not prevent this non-repeatable read pattern at the application level.
The fix is database-level concurrency control. I'll walk through the two main options in the deep dives, but the minimal change here is adding SELECT ... FOR UPDATE to the balance read.
Components added: No new services. This is a database-level locking decision.
4. Transfer status queries
The read path is lightweight but needs to scale without touching the primary DB.
Components added:
- Read Replica: Async-replicated copy of the Accounts DB. Status queries route here, not to the primary.
Request walkthrough:
- Client sends
GET /transfers/{transfer_id}. - Transfer Service routes the query to the read replica.
- Return the transfer record with current status.
Replica lag (10-50ms) is safe here because transfer status is monotonically forward-moving: PENDING to COMPLETED or FAILED, never backward. A slightly stale "PENDING" response prompts the client to poll again -- that is expected behavior, not a correctness failure.
Potential Deep Dives
1. How do we prevent concurrent debits from overdrafting an account at scale?
This is the most frequently probed question for payment system design. At 5,000 transfers/second with a large business account receiving hundreds of concurrent debits, lock contention becomes the system's bottleneck.
Constraints:
- No overdrafts: balance must never go negative after a debit.
- Latency budget: locking must not add more than ~50ms to the transfer path.
- Throughput: at 5,000 tps, popular accounts (payroll runs, marketplace settlements) may receive 200-500 concurrent transfers simultaneously.
2. How do we guarantee exactly-once execution (idempotency at scale)?
The question is not whether to use idempotency keys -- you must. The question is how to implement the check such that it is atomic with the transfer execution, even in a distributed deployment.
Constraints:
- Same key from the same caller must produce the same result every time.
- Two concurrent requests with the same key must not both execute the transfer.
- Keys expire after 24 hours (no need to store indefinitely).
3. How do we design the ledger for auditability and balance correctness?
A balance stored as a single mutable number is convenient but fragile. A bug that applies the wrong delta leaves no trace. Reconstructing the correct balance from history becomes impossible.
Constraints:
- Every balance change must be traceable to a specific transfer.
- Balance reads must complete in under 10ms for transfer eligibility checks.
- The system must support account statement generation covering 12 months of history.
4. How do we handle extreme lock contention on popular accounts?
The account actor model (Deep Dive 1) serialises transfers per account and removes DB lock contention. But a single actor processes ~100 transfers/second. A payroll settlement account receiving 500 credits/second queues transfers faster than they drain.
5. How do we handle a transfer that partially fails across services (distributed saga)?
In a microservices architecture where the Transfer Service, Accounts DB, and an external Bank API are independent systems, the naive "call them all sequentially and hope" approach fails when any step fails mid-sequence.
Final Architecture
Component summary:
| Component | Role |
|---|---|
| API Gateway | Auth, rate limiting, Idempotency-Key forwarding to Transfer Service |
| Transfer Service | Saga orchestration, idempotency enforcement, routing writes to account actors |
| Account Actor System | Per-account serial queue, no DB row locks needed, crash-safe via PENDING replay |
| Redis Cluster | Distributed idempotency locks, actor in-memory queue state |
| Primary DB | All writes: accounts, ledger entries, transfers, idempotency records, saga log |
| Read Replica | Status queries, account statements, background reconciliation jobs |
| Bank API | External bank settlement; called with transfer ID as idempotency reference |
Interview Cheat Sheet
Signal the high ground immediately. "This is a distributed transaction problem at its core. The two hardest constraints are exactly-once execution (idempotency) and no overdrafts under concurrent load (concurrency control). Every design decision traces back to those two."
On the bank API difference from card networks. Wire transfers call bank APIs directly, not a card network. The bank's API behaves like any external HTTP call: it can timeout, succeed idempotently (if you send the same reference twice), or fail with no way to automatically compensate. There is no chargeback network to catch mistakes -- correctness is entirely your responsibility.
On concurrency (overdraft prevention):
- Pessimistic locking (SELECT FOR UPDATE): simple, correct, degrades above ~500 concurrent transfers to the same account due to lock queue buildup.
- Optimistic locking: good for low-contention accounts, collapses into retry storms under high contention.
- Account actor model: serialize at the application layer, remove DB locking entirely. One queue per account, one worker processing it. Requires consistent routing (consistent hash on account ID) and crash recovery (replay PENDING transfers on restart).
- Virtual sub-account sharding: partition a single logical account across N sub-accounts, each with its own actor. 10 sub-accounts = ~1,000 tps capacity per logical account.
On idempotency (exactly-once execution):
- Always client-supplied UUID. The client generates the key before the first attempt and reuses it on every retry.
- Atomic backstop: DB unique constraint on
(idempotency_key, caller_id)withINSERT ON CONFLICT DO NOTHING. One insert wins, all others read the stored response. - Redis distributed lock as an optimization layer for distributed nodes (avoids DB write for duplicate detection on well-behaved retries).
- TOCTOU is the failure mode interviewers probe for. Check-then-act in separate transactions is broken.
On the ledger design:
- Mutable balance column alone: no audit trail, regulatory failure, silent corruption risk.
- Pure append-only: full audit trail but balance derivation is O(N) over all history.
- Checkpoint pattern (production standard): append-only
ledger_entriesfor auditability +balancecheckpoint on the account row for O(1) reads +balance_as_offor crash recovery via replay.
On the saga pattern:
- Use orchestration-based saga when steps span multiple services or include external API calls.
- Write the saga log to durable storage before each step. On restart, replay non-terminal sagas.
- Bank API compensation (cancelling a completed bank transfer) is often impossible. Flag for manual operations review.
- Each step must be idempotent. Bank API calls carry the transfer ID as the reference so the bank deduplicates on retry.
Numbers to know:
- Pessimistic lock throughput per account: ~100 tps before queue buildup.
- Account actor throughput per account: ~100 tps (same ceiling, but no DB lock overhead and no deadlocks).
- Virtual sub-accounts (10 partitions): ~1,000 tps per logical account.
- Idempotency key TTL: 24 hours.
- Redis lock TTL: 30 seconds (generous for a <500ms transfer path).
- Read replica lag: 10-50ms (acceptable for forward-only status transitions).
Tradeoff table:
| Concurrency approach | Throughput/account | Complexity | Failure mode |
|---|---|---|---|
| SELECT FOR UPDATE | ~100 tps | Low | Lock queue under extreme load, deadlock risk |
| Optimistic locking | ~100 tps (low contention) | Medium | Retry storm under high contention |
| Account actor model | ~100 tps, partitionable | High | Actor crash loses queued transfers (mitigated by PENDING replay) |
| Virtual sub-accounts | 1,000+ tps | Very high | Balance aggregation complexity on large debits |