Rate Limiter

What is an API rate limiter?

A rate limiter controls how many requests a client can make in a given time window. Visit the Twitter API twice a second or a thousand times a minute and you cross a threshold; the next request gets a 429. The interesting engineering problem is not the counting itself; it is counting atomically across a fleet of servers without a per-request distributed lock, while keeping reject latency under 5ms and surviving Redis node failures without cascading 429s that bring your entire service down.

I open every rate limiter interview by asking: "What happens to your users when Redis dies?" The answer reveals whether someone has built one or just read about one.

Functional Requirements

Core Requirements

Limit requests from a client to N per time window (for example, 1,000 req/min per API key).
Return HTTP 429 with a Retry-After header when the limit is exceeded.
Rules can vary per endpoint and per client tier (free vs. premium).
Rules are configurable without a code deploy.

Below the Line (out of scope)

Billing or quota enforcement
Full WAF protection or DDoS mitigation
Geographic restrictions

The hardest part in scope: The distributed counter race condition. Counting requests in a single process is trivial. Counting them accurately across dozens of stateless servers sharing a Redis instance, without a lock per request, under sub-second time windows is the actual engineering challenge. We will spend two full deep dives on it.

Billing and quota enforcement are below the line because they require integration with a payments system and a separate overage billing model. To add them, tie the rate limit rules to a subscription_tier column in a billing table. When a request arrives, the rule lookup resolves the tier and checks the billing system for active quotas.

The rate limiter becomes a quota enforcer rather than just a traffic shaper.

WAF and DDoS mitigation are below the line because they require packet-level inspection, IP reputation scoring, and bot fingerprinting. These belong in a dedicated network appliance (CloudFlare, AWS WAF) sitting upstream of the rate limiter. The rate limiter handles API-level semantics; a WAF handles transport-level threats.

Rate Limiter

What is an API rate limiter?

Functional Requirements

Core Requirements

Below the Line (out of scope)

Comments

Non-Functional Requirements

Core Requirements

Below the Line

Core Entities

API Design

High-Level Design

1. Single server naive: in-memory counter per API key

2. Distributed tracking with a centralized Redis counter

3. Configurable rules per tier and endpoint

Potential Deep Dives

1. Which rate-limiting algorithm should we use?

2. How do we store counters at distributed scale?

3. How do we handle Redis failures without cascading 429s?

4. How do we identify clients and resolve rules across tiers?

Final Architecture

Interview Cheat Sheet