Rate limiter
Low-level design of an in-process rate limiter -- token bucket, sliding window, and leaky bucket algorithms. Covers per-key limits, burst handling, thread-safe token management, and composable multi-tier limiting.
The Problem
Your API gateway handles 50,000 requests per second across 10,000 active users. A handful of users (scrapers, misbehaving clients, accidental retry loops) send 500+ requests in a single second, overwhelming downstream services and degrading the experience for everyone else. The operations team currently blocks abusers manually, but by the time they react the damage is already done.
A rate limiter solves this by enforcing a maximum request rate per user, IP, or API key. Each incoming request checks whether the caller has budget remaining. If yes, the request proceeds and the budget decreases. If no, the request is rejected immediately with a 429 status and a "retry after" hint. Different algorithms (token bucket, sliding window, leaky bucket) offer different tradeoffs between burst tolerance, memory usage, and precision.
Design the core classes for an in-process rate limiter that supports pluggable algorithms (token bucket, sliding window, leaky bucket), per-key limiting, configurable burst allowance, thread-safe concurrent access, and composable multi-tier rules.
Requirements
Clarifying Questions
Before jumping into class design, ask questions to turn the vague prompt into a concrete specification. Cover four areas: core actions, error handling, boundaries, and future extensions.
You: "Which rate-limiting algorithm should we support? Just one, or multiple?"
Interviewer: "Start with token bucket. Design it so adding sliding window, leaky bucket, or any custom algorithm is a one-class change."
Pluggable algorithms. That is a textbook Strategy pattern. The limiter delegates the allow/deny decision to an algorithm object behind a common interface.
You: "Is the limit per-user, per-IP, global, or configurable?"
Interviewer: "Per-key. The key could be a user ID, IP address, or API key. Each key gets its own limiter instance with its own configuration."
Per-key means a registry that maps each key to its own limiter. ConcurrentHashMap with computeIfAbsent is the natural fit here.
You: "Should some requests cost more than others? For example, a bulk export costs 10 tokens while a simple read costs 1."
Interviewer: "Yes. The caller specifies how many tokens a request costs."
Variable cost per request. The tryAcquire method needs a permits parameter, not just a boolean "one request happened."
You: "What happens when a request is denied? Should we reject immediately or queue it?"
Interviewer: "Reject immediately and return how long the caller should wait before retrying."
Reject with retry-after. That means our result object needs to include the wait duration, not just allowed/denied.
You: "Do we need to support bursts? For example, a user who has been idle should be able to send a burst of requests up to some limit."
Interviewer: "Yes for the token bucket algorithm. Sliding window and leaky bucket inherently don't support bursting."
Burst allowance is algorithm-specific. Token bucket accumulates tokens during idle periods (up to a max capacity), which enables bursting. The other algorithms enforce a smooth rate.
You: "Should we support multi-tier limiting? For example, 100 requests per second AND 10,000 per hour for the same user."
Interviewer: "Yes, as a composition feature. A request passes only if ALL tiers allow it."
Multi-tier means a composite limiter that chains multiple limiters. All must approve for the request to proceed.
You: "Is this in-process only, or does it need to work across multiple servers via Redis?"
Interviewer: "Design for in-process. Mention distributed (Redis-backed) as an extension."
Good. In-process keeps the core simple. We avoid network round-trips and clock synchronization issues.
You: "Does the limiter need to be thread-safe?"
Interviewer: "Absolutely. Multiple threads call tryAcquire concurrently for the same key."
Thread safety for concurrent access on the same key. We need either synchronized blocks or atomic operations for token manipulation.
Perfect. You have now clarified scope and ruled out unnecessary complexity.
Final Requirements
Functional Requirements:
tryAcquire(key)andtryAcquire(key, permits)check whether a request is allowed and consume budget if so- Support token bucket (burst-friendly), sliding window (precise smoothing), and leaky bucket (constant output rate) algorithms
- Return a result containing: allowed/denied, remaining budget, reset time, and retry-after duration
- Per-key limiting where each key gets its own independent limiter instance
- Composable multi-tier rules (e.g., 100/sec AND 10,000/hour) where all tiers must pass
Non-Functional Requirements:
- Thread-safe for concurrent tryAcquire calls on the same key
- Sub-microsecond happy-path latency (no locks on the common case when possible)
- Pluggable algorithm via Strategy pattern so new algorithms require zero changes to existing code
Out of Scope:
- Distributed rate limiting (Redis backend)
- HTTP middleware integration
- Persistence across restarts
- Dynamic configuration reloading
Interview tip: number your requirements
Writing numbered requirements on the whiteboard shows the interviewer you are methodical. It also gives you reference points: "Requirement 5 drives the Composite pattern choice."
Example Inputs and Outputs
Scenario 1: Token bucket allows burst then throttles
- Setup: Token bucket with capacity 5, refill rate 2 tokens/sec
- Input: 5 rapid requests for key "user-42"
- Expected: All 5 allowed (burst from full bucket). 6th request denied with
retryAfter = 500ms - Why: Validates burst behavior and budget exhaustion
Scenario 2: Sliding window enforces smooth rate
- Setup: Sliding window with max 3 requests per 1-second window
- Input: 3 requests at t=0.0s, 1 request at t=0.5s
- Expected: First 3 allowed. 4th denied because 3 requests already exist in the [0.0, 1.0) window
- Why: Validates that sliding window has no burst allowance and counts precisely
Scenario 3: Multi-tier composite limiting
- Setup: Tier 1 = 5/sec (token bucket), Tier 2 = 20/min (sliding window)
- Input: 5 requests in second 1, 5 in second 2, 5 in second 3, 5 in second 4
- Expected: All pass for seconds 1-4 (20 total). First request in second 5 is denied by Tier 2 (20/min exhausted) even though Tier 1 has budget
- Why: Validates that the composite limiter enforces the most restrictive tier
Try It Yourself
Try it yourself
Before reading the solution, spend 15-20 minutes sketching the core interface and one algorithm implementation. Focus on how tryAcquire decides whether to allow a request and what data structure tracks usage. Think about thread safety early. Compare your approach with the walkthrough below.
Step 1: Identify Core Entities
Start by asking: what are the main "things" in this problem? Look for nouns in your requirements. A rate limiter has the algorithm that decides allow/deny, a configuration that defines limits, a result object that communicates the decision, and a registry that maps keys to individual limiter instances.
A common mistake is lumping the algorithm, registry, and configuration into one big class. Each has a different reason to change. The algorithm changes when you swap token bucket for sliding window. The registry changes when you switch from in-memory to Redis. The configuration changes when you add new parameters. Keep them separate.
| Entity | Responsibility | Key attributes |
|---|---|---|
| RateLimiter | Strategy interface. Decides allow/deny for a single key's limiter instance. | tryAcquire(permits) |
| RateLimiterConfig | Immutable configuration for a limiter (capacity, rate, window size). | maxPermits, refillRate, windowSize |
| TokenBucketLimiter | Strategy implementation. Allows bursts up to capacity, then smooth refill. | tokens, capacity, refillRate, lastRefillTime |
| SlidingWindowLimiter | Strategy implementation. Counts requests in a rolling time window. | windowSize, maxRequests, counters |
| LeakyBucketLimiter | Strategy implementation. Queue-based, processes at constant output rate. | queueSize, leakRate, waterLevel |
| RateLimitResult | Value object returned to caller. Communicates the decision and metadata. | allowed, remaining, resetAt, retryAfter |
| RateLimiterRegistry | Per-key management. Maps each key to its own limiter instance. | limiters (ConcurrentHashMap) |
| CompositeRateLimiter | Chains multiple limiters. Request passes only if ALL approve. | children (List of RateLimiter) |
Notice we separated the algorithm (RateLimiter implementations) from the per-key management (RateLimiterRegistry). The registry is responsible for creating and looking up limiter instances. The limiter itself only knows how to check a single key's budget. This means you can test each algorithm in isolation without any registry logic.
Step 2: Define Relationships and Class Design
RateLimiter (Strategy Interface)
The core abstraction. Every algorithm implements this interface, and the registry doesn't care which algorithm is behind it.
Deriving methods from requirements:
| Requirement | Method |
|---|---|
| "Check whether a request is allowed and consume budget" | tryAcquire(int permits) |
| "Return remaining budget and reset time" | peek() (non-consuming check) |
RateLimitResult (Value Object)
Deriving state from requirements:
| Requirement | What RateLimitResult must carry |
|---|---|
| "Allowed or denied" | boolean allowed |
| "Remaining budget" | int remaining |
| "When the budget resets" | Instant resetAt |
| "How long to wait before retrying" | Duration retryAfter (null if allowed) |
We use a Java record for this. Immutable, compact, no boilerplate.
RateLimiterRegistry (Per-Key Management)
The registry is the entry point callers interact with. It maps string keys to limiter instances, creating them lazily on first access. Thread safety comes from ConcurrentHashMap's computeIfAbsent, which guarantees at-most-once creation per key.
| Requirement | What the Registry must track |
|---|---|
| "Per-key limiting" | ConcurrentHashMap<String, RateLimiter> |
| "Pluggable algorithm" | RateLimiterFactory to create the right algorithm |
| "Cleanup for inactive keys" | remove(key) to reclaim memory |
Key Relationship Decisions
RateLimiterRegistry owns RateLimiter instances (composition). Limiters don't exist outside a registry. When a key is removed, its limiter is garbage collected.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.