Rate limiter

The Problem

Your API gateway handles 50,000 requests per second across 10,000 active users. A handful of users (scrapers, misbehaving clients, accidental retry loops) send 500+ requests in a single second, overwhelming downstream services and degrading the experience for everyone else. The operations team currently blocks abusers manually, but by the time they react the damage is already done.

A rate limiter solves this by enforcing a maximum request rate per user, IP, or API key. Each incoming request checks whether the caller has budget remaining. If yes, the request proceeds and the budget decreases. If no, the request is rejected immediately with a 429 status and a "retry after" hint. Different algorithms (token bucket, sliding window, leaky bucket) offer different tradeoffs between burst tolerance, memory usage, and precision.

Design the core classes for an in-process rate limiter that supports pluggable algorithms (token bucket, sliding window, leaky bucket), per-key limiting, configurable burst allowance, thread-safe concurrent access, and composable multi-tier rules.

Requirements

Clarifying Questions

Before jumping into class design, ask questions to turn the vague prompt into a concrete specification. Cover four areas: core actions, error handling, boundaries, and future extensions.

You: "Which rate-limiting algorithm should we support? Just one, or multiple?"

Interviewer: "Start with token bucket. Design it so adding sliding window, leaky bucket, or any custom algorithm is a one-class change."

Pluggable algorithms. That is a textbook Strategy pattern. The limiter delegates the allow/deny decision to an algorithm object behind a common interface.

You: "Is the limit per-user, per-IP, global, or configurable?"

Interviewer: "Per-key. The key could be a user ID, IP address, or API key. Each key gets its own limiter instance with its own configuration."

Per-key means a registry that maps each key to its own limiter. ConcurrentHashMap with computeIfAbsent is the natural fit here.

You: "Should some requests cost more than others? For example, a bulk export costs 10 tokens while a simple read costs 1."

Interviewer: "Yes. The caller specifies how many tokens a request costs."

Variable cost per request. The tryAcquire method needs a permits parameter, not just a boolean "one request happened."

You: "What happens when a request is denied? Should we reject immediately or queue it?"

Interviewer: "Reject immediately and return how long the caller should wait before retrying."

Reject with retry-after. That means our result object needs to include the wait duration, not just allowed/denied.

You: "Do we need to support bursts? For example, a user who has been idle should be able to send a burst of requests up to some limit."

Interviewer: "Yes for the token bucket algorithm. Sliding window and leaky bucket inherently don't support bursting."

Burst allowance is algorithm-specific. Token bucket accumulates tokens during idle periods (up to a max capacity), which enables bursting. The other algorithms enforce a smooth rate.

You: "Should we support multi-tier limiting? For example, 100 requests per second AND 10,000 per hour for the same user."

Interviewer: "Yes, as a composition feature. A request passes only if ALL tiers allow it."

Multi-tier means a composite limiter that chains multiple limiters. All must approve for the request to proceed.

You: "Is this in-process only, or does it need to work across multiple servers via Redis?"

Interviewer: "Design for in-process. Mention distributed (Redis-backed) as an extension."

Good. In-process keeps the core simple. We avoid network round-trips and clock synchronization issues.

You: "Does the limiter need to be thread-safe?"

Interviewer: "Absolutely. Multiple threads call tryAcquire concurrently for the same key."

Thread safety for concurrent access on the same key. We need either synchronized blocks or atomic operations for token manipulation.

Perfect. You have now clarified scope and ruled out unnecessary complexity.

Final Requirements

Functional Requirements:

tryAcquire(key) and tryAcquire(key, permits) check whether a request is allowed and consume budget if so
Support token bucket (burst-friendly), sliding window (precise smoothing), and leaky bucket (constant output rate) algorithms
Return a result containing: allowed/denied, remaining budget, reset time, and retry-after duration
Per-key limiting where each key gets its own independent limiter instance
Composable multi-tier rules (e.g., 100/sec AND 10,000/hour) where all tiers must pass

Non-Functional Requirements:

Thread-safe for concurrent tryAcquire calls on the same key
Sub-microsecond happy-path latency (no locks on the common case when possible)
Pluggable algorithm via Strategy pattern so new algorithms require zero changes to existing code

Out of Scope:

Distributed rate limiting (Redis backend)
HTTP middleware integration
Persistence across restarts
Dynamic configuration reloading

Interview tip: number your requirements

Writing numbered requirements on the whiteboard shows the interviewer you are methodical. It also gives you reference points: "Requirement 5 drives the Composite pattern choice."

Example Inputs and Outputs

Scenario 1: Token bucket allows burst then throttles

Setup: Token bucket with capacity 5, refill rate 2 tokens/sec
Input: 5 rapid requests for key "user-42"
Expected: All 5 allowed (burst from full bucket). 6th request denied with retryAfter = 500ms
Why: Validates burst behavior and budget exhaustion

Scenario 2: Sliding window enforces smooth rate

Setup: Sliding window with max 3 requests per 1-second window
Input: 3 requests at t=0.0s, 1 request at t=0.5s
Expected: First 3 allowed. 4th denied because 3 requests already exist in the [0.0, 1.0) window
Why: Validates that sliding window has no burst allowance and counts precisely

Scenario 3: Multi-tier composite limiting

Setup: Tier 1 = 5/sec (token bucket), Tier 2 = 20/min (sliding window)
Input: 5 requests in second 1, 5 in second 2, 5 in second 3, 5 in second 4
Expected: All pass for seconds 1-4 (20 total). First request in second 5 is denied by Tier 2 (20/min exhausted) even though Tier 1 has budget
Why: Validates that the composite limiter enforces the most restrictive tier

Try It Yourself

Try it yourself

Before reading the solution, spend 15-20 minutes sketching the core interface and one algorithm implementation. Focus on how tryAcquire decides whether to allow a request and what data structure tracks usage. Think about thread safety early. Compare your approach with the walkthrough below.

Step 1: Identify Core Entities

Start by asking: what are the main "things" in this problem? Look for nouns in your requirements. A rate limiter has the algorithm that decides allow/deny, a configuration that defines limits, a result object that communicates the decision, and a registry that maps keys to individual limiter instances.

A common mistake is lumping the algorithm, registry, and configuration into one big class. Each has a different reason to change. The algorithm changes when you swap token bucket for sliding window. The registry changes when you switch from in-memory to Redis. The configuration changes when you add new parameters. Keep them separate.

Entity	Responsibility	Key attributes
RateLimiter	Strategy interface. Decides allow/deny for a single key's limiter instance.	tryAcquire(permits)
RateLimiterConfig	Immutable configuration for a limiter (capacity, rate, window size).	maxPermits, refillRate, windowSize
TokenBucketLimiter	Strategy implementation. Allows bursts up to capacity, then smooth refill.	tokens, capacity, refillRate, lastRefillTime
SlidingWindowLimiter	Strategy implementation. Counts requests in a rolling time window.	windowSize, maxRequests, counters
LeakyBucketLimiter	Strategy implementation. Queue-based, processes at constant output rate.	queueSize, leakRate, waterLevel
RateLimitResult	Value object returned to caller. Communicates the decision and metadata.	allowed, remaining, resetAt, retryAfter
RateLimiterRegistry	Per-key management. Maps each key to its own limiter instance.	limiters (ConcurrentHashMap)
CompositeRateLimiter	Chains multiple limiters. Request passes only if ALL approve.	children (List of RateLimiter)

Notice we separated the algorithm (RateLimiter implementations) from the per-key management (RateLimiterRegistry). The registry is responsible for creating and looking up limiter instances. The limiter itself only knows how to check a single key's budget. This means you can test each algorithm in isolation without any registry logic.

Step 2: Define Relationships and Class Design

RateLimiter (Strategy Interface)

The core abstraction. Every algorithm implements this interface, and the registry doesn't care which algorithm is behind it.

Deriving methods from requirements:

Requirement	Method
"Check whether a request is allowed and consume budget"	`tryAcquire(int permits)`
"Return remaining budget and reset time"	`peek()` (non-consuming check)

RateLimitResult (Value Object)

Deriving state from requirements:

Requirement	What RateLimitResult must carry
"Allowed or denied"	`boolean allowed`
"Remaining budget"	`int remaining`
"When the budget resets"	`Instant resetAt`
"How long to wait before retrying"	`Duration retryAfter` (null if allowed)

We use a Java record for this. Immutable, compact, no boilerplate.

RateLimiterRegistry (Per-Key Management)

The registry is the entry point callers interact with. It maps string keys to limiter instances, creating them lazily on first access. Thread safety comes from ConcurrentHashMap's computeIfAbsent, which guarantees at-most-once creation per key.

Requirement	What the Registry must track
"Per-key limiting"	`ConcurrentHashMap<String, RateLimiter>`
"Pluggable algorithm"	`RateLimiterFactory` to create the right algorithm
"Cleanup for inactive keys"	`remove(key)` to reclaim memory

Key Relationship Decisions

RateLimiterRegistry owns RateLimiter instances (composition). Limiters don't exist outside a registry. When a key is removed, its limiter is garbage collected.