How fraud detection works
Modern fraud detection combines a rule engine for known patterns, an ML ensemble for scoring novel transactions, and a real-time feature store for sub-100ms decisions. Learn the architecture behind Stripe Radar, PayPal, and bank card fraud systems.
The Problem Statement
Interviewer: "You are building the fraud detection system for a payment processor like Stripe. A transaction comes in and you have less than 100 milliseconds to decide: approve, decline, or send to review. How do you architect a system that catches fraud without blocking legitimate customers?"
This question tests three things: your understanding of real-time ML inference pipelines, your ability to reason about the false positive versus false negative tradeoff that dominates fraud economics, and whether you know how to engineer features from raw event streams for sub-100ms decisions.
Most candidates jump straight to "use a machine learning model." That is only one layer. The real system has three layers working in concert: a rule engine for deterministic blocks (stolen card lists, velocity limits), a feature store that precomputes signals in real-time, and an ML scoring pipeline that handles the uncertain cases. Miss any layer and you either block too many legitimate customers or let fraud through.
I like this question because it combines real-time systems, machine learning in production, and a genuinely brutal business constraint: studies show that roughly 50% of customers whose legitimate transaction is declined will never retry and never return. Your fraud system is not just a classifier. It is a revenue protection system that must balance precision and recall against real dollar losses.
Clarifying the Scenario
You: "Before I architect this, I want to understand the constraints."
You: "When you say sub-100ms, is that the total time from transaction arrival to approve/decline response? Or is there additional time for the payment network round-trip?"
Interviewer: "100ms is your budget for the fraud decision. The payment network adds its own latency on top."
You: "Got it. What is the fraud rate we are dealing with? Consumer card fraud is typically 0.1% to 1% of transactions."
Interviewer: "Assume 0.1% fraud rate. Very imbalanced."
You: "And should I cover just card-not-present fraud (online transactions), or also card-present (in-store)?"
Interviewer: "Focus on card-not-present. That is where most of the fraud happens."
You: "OK. I will structure this in three layers: a deterministic rule engine for hard blocks, a real-time feature store for signal computation, and an ML scoring pipeline for the uncertain middle. Then I will talk about the feedback loop, because the model needs to learn from chargebacks that arrive 30 to 60 days after the transaction."
My Approach
I break this into five parts:
- The three-layer architecture: Rule engine, feature store, ML scoring, and how they chain together within the latency budget
- Real-time feature store: How to compute velocity features, device fingerprints, and behavioral signals in real-time
- ML scoring pipeline: Model selection, the class imbalance problem, and how to train on a 0.1% positive rate
- Adversarial adaptation: How fraudsters evolve tactics and how the system must retrain and adapt
- Action routing: How to route approve, decline, stepped-up auth (3DS), and manual review decisions
The mental model I use: think of the fraud system as airport security. The rule engine is the no-fly list check, an instant, deterministic lookup. The feature store is the baggage scanner, computing features about you in real-time (travel history, booking patterns, companion travelers). The ML model is the behavioral profiler who watches body language and decides whether you need additional screening. And manual review is the secondary screening room for uncertain cases.
At Stripe's scale (hundreds of millions of transactions per year), even a 0.01% improvement in fraud detection precision saves millions of dollars annually. And a 0.01% increase in false positive rate blocks thousands of legitimate customers who may never come back. Every basis point matters at scale.
The Architecture
Here is the full three-layer fraud detection pipeline, from transaction arrival to decision:
Let me walk through the latency budget:
| Layer | Budget | What happens |
|---|---|---|
| Rule engine | < 5ms | Blocklist hash lookup, Redis counter check, geo distance calc |
| Feature store | < 20ms | Redis GET for real-time features, join with batch features |
| ML inference | < 50ms | GBM predict + graph model score + ensemble combination |
| Decision routing | < 5ms | Threshold comparison and action dispatch |
| Total | < 80ms | Leaves 20ms buffer within 100ms SLA |
The rule engine fires first because it handles the obvious cases instantly. If a card is on the stolen list, there is no point computing features or running ML inference. The rule engine rejects about 2-5% of transactions before they reach the expensive layers.
The feature store runs in parallel with Model loading: real-time features from Redis, historical features from a pre-joined lookup table, and device fingerprint matching. These are all read operations, precomputed by streaming and batch pipelines.
The ML ensemble takes the feature vector and produces a fraud probability score. This is the most computationally expensive step, but a well-optimized GBM inference on 200 features takes about 1-5ms. The bulk of the "50ms" budget is for feature assembly and network hops, not model inference itself.
Real-Time Feature Store and Velocity Features
The feature store is the backbone of the entire system. The ML model is only as good as the features it receives. And for fraud detection, the most predictive features are real-time velocity signals that must be computed within milliseconds.
What makes a good fraud feature
The best fraud features capture behavioral deviation: is this transaction different from what we expect for this card, this user, this merchant? Here are the categories:
| Feature Category | Examples | Update Frequency | Storage |
|---|---|---|---|
| Velocity | Txns in last 1h, 24h, 7d for this card | Per-transaction | Redis sorted sets |
| Monetary | Avg txn amount (30d), max single txn | Daily batch + streaming delta | Feature table |
| Geographic | Distance from last txn, country risk score | Per-transaction | Redis + GeoIP |
| Device | Fingerprint match, new device flag, VPN detection | Per-session | Redis hash |
| Behavioral | Time between page load and purchase, typing speed | Per-session | Session store |
| Account | Age, email domain, verification status | Rarely changes | User table |
| Network | Shared device with known fraud account, card-to-email graph | Hourly batch | Graph DB |
The real-time features (velocity, geographic, device) are the hardest to build because they must be computed within the 20ms feature store budget.
The trickiest part of the feature store is maintaining windowed counters in real-time. "Number of transactions from this card in the last 24 hours" sounds simple, but at millions of transactions per day, you need an efficient data structure that supports both increment and time-based expiry.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.