Ride Matching
Design the driver dispatch engine that matches a rider's request to the nearest available driver in milliseconds, covering real-time geospatial indexing, conflict prevention, and the rebalancing challenges of a global fleet.
What is a ride-matching system?
A ride-matching system connects a rider's pickup request to the nearest available driver. The apparent simplicity hides two hard engineering problems: you need a real-time geospatial index that tracks millions of continuously moving drivers with sub-second freshness, and you need to atomically claim a driver the instant you select them so two riders can never be routed to the same car. I like this question in interviews because it forces you to solve geospatial indexing, atomic concurrency control, and high-throughput location ingestion all in one sitting.
Functional Requirements
Core Requirements
- A rider requests a ride at a given pickup location.
- The system finds the nearest available driver and sends them a request.
- If the driver accepts, both sides receive each other's contact information and ETA.
- If the driver declines or times out, the system tries the next nearest driver.
Scope Exclusions
- Routing and turn-by-turn navigation (covered in Design an ETA Service).
- Dynamic surge pricing (covered in Design a Surge Pricing System).
Non-Functional Requirements
Core Requirements
- Match latency: A matched driver is returned to the rider in under 200ms from the moment the ride request is received.
- Availability: 99.99% uptime. Availability over consistency: a driver location that is 3 seconds stale is acceptable, a failed match is not.
- Location freshness: Driver positions are current to within 5 seconds at all times. This sets the GPS update interval at 4 seconds per driver.
- Location write throughput: 1M concurrently active drivers each sending GPS updates every 4 seconds equals 250K location writes per second at peak.
- Consistency: Each driver is atomically claimed by exactly one rider. No double-booking, even under concurrent match requests.
- Scale: 5M registered drivers globally. 2K trip requests per second during peak periods.
Below the Line
- Real-time driver tracking during an active trip (separate WebSocket streaming problem)
- Trip lifecycle management and status persistence
- Fraud detection and driver eligibility filtering
Read/write ratio: Location writes dominate: 250K writes per second at peak versus roughly 2K trip-request queries per second. That is a 125:1 write-to-read ratio on the location path. Each match query is a geospatial scan (read-intensive per request) but low-rate relative to the location write firehose.
I'd call out this ratio early in an interview because it changes everything downstream. The write volume determines how the location ingestion pipeline is built; the query access pattern determines the geospatial index structure. They require separate services with separate scaling axes.
Real-time trip tracking is below the line because it does not touch the matching pipeline. To add it: open a WebSocket connection between both apps on trip acceptance and relay driver GPS updates through Redis Pub/Sub on a trip-specific channel. The Location Service publishes to that channel whenever it receives an update for a driver on an active trip.
Trip lifecycle management is below the line because it runs after the match is made. To add it: introduce a Trip Service that owns the state machine (requested, accepted, in_progress, completed) and persists each state transition to a PostgreSQL table with timestamps for audit and billing.
The hardest problem in scope: Atomically removing a driver from the available pool the moment they are matched, preventing two concurrent ride requests from being routed to the same driver. A naive SELECT-then-UPDATE is a TOCTOU race condition that fires constantly in production at this request rate. The atomic claim is where bugs hide.
Core Entities
- Rider: A registered user with a
rider_idwho can request rides. Pickup location is captured at request time, not stored as a persistent field. - Driver: A registered driver with
driver_id,status(available, matched, on_trip, offline), and a linked vehicle record. - DriverLocation: The current GPS snapshot for a driver:
driver_id,latitude,longitude,updated_at. Ephemeral, stored in Redis, not written to the primary DB on every update. - RideRequest: A pending match record with
request_id,rider_id,pickup_lat,pickup_lng,status(pending, offer_sent, matched, expired), and acandidate_queueof driver IDs to try in order. - RideOffer: An in-flight offer from the system to a specific driver:
offer_id,request_id,driver_id,sent_at,expires_at. Expires after 10 seconds if no driver response arrives.
Full schema details (indexes, partition keys, TTLs) are deferred to the deep dives. These five entities are sufficient to drive the API design and High-Level Design.
API Design
I keep the API surface minimal: two actors (rider, driver), four action types.
FR 1 - Rider requests a ride:
POST /rides
Body: { pickup_lat, pickup_lng }
Response: { ride_id, status: "matching" }
The rider gets a ride_id immediately and polls GET /rides/{ride_id} for a driver assignment. The server returns before a driver is found so match latency is invisible to the caller.
FR 2 - Driver broadcasts location (enables matching):
POST /drivers/location
Body: { latitude, longitude, status }
Response: 200 OK
Called every 4 seconds per active driver. The response body is empty; acknowledging receipt is sufficient. The service must sustain 250K concurrent callers at peak.
FR 3 and FR 4 - Driver responds to an offer:
PUT /rides/{ride_id}/respond
Body: { driver_id, decision: "accept" | "decline" }
Response: { status, rider_contact? }
One endpoint handles both accept and decline. The branching logic belongs in the service, not in the URL. On accept, rider_contact carries the rider's name and phone number.
Rider polls for match status:
GET /rides/{ride_id}
Response: { ride_id, status, driver?: { driver_id, name, phone, eta_seconds } }
Polling is sufficient for the HLD. In a production system I would evolve this to a WebSocket opened on the POST response, with the server pushing status changes as they occur.
High-Level Design
1. Rider requests a ride
The write path: the rider submits pickup coordinates, the Match Service creates a ride record, and returns a ride ID. Driver matching starts asynchronously.
Components:
- Rider App: Mobile client sending
POST /rides. - Match Service: Validates the request, creates the ride record, and queues the match attempt.
- Ride DB: Stores the authoritative
RideRequestrecord. Status begins aspending.
Request walkthrough:
- Rider app sends
POST /rideswith pickup lat/lng. - Match Service validates the coordinates (valid lat/lng range).
- Match Service inserts
{ request_id, rider_id, pickup_lat, pickup_lng, status: "pending" }into Ride DB. - Match Service returns
{ ride_id, status: "matching" }to the rider. - Rider begins polling
GET /rides/{ride_id}every 2 seconds.
This covers the write path only. The matching step that queries available drivers is in the next section.
2. Geospatial driver discovery
Drivers broadcast GPS every 4 seconds. The system must answer "which drivers are within 2km of this pickup?" in under 50ms to hold total match latency under 200ms.
A plain SQL range query scans the full drivers table on every request. At 1M active drivers, the optimizer picks one dimension and range-scans on the other even with a composite (lat, lng) index. I've seen this exact failure mode in production: the query plan looks fine in testing with 10K rows, then collapses under load when the table hits seven figures.
The fix is a geospatial index in Redis using GEOADD and GEORADIUS. Redis holds all available drivers in memory, and GEORADIUS returns nearby drivers in under 1ms for 1M entries. I'll treat the specific index structure (Geohash vs H3) as a black box here and cover it in the deep dives.
Components:
- Driver App: Mobile client calling
POST /drivers/locationevery 4 seconds. - Location Service: A separate service that accepts driver GPS updates and writes them to the geospatial index. Decoupled from the Match Service so location write throughput does not affect match latency.
- Redis Geo (Driver Index): Available drivers stored with geohash-encoded scores.
GEORADIUSreturns members within a given radius sorted by distance.
Request walkthrough (location update):
- Driver app sends
POST /drivers/locationwith lat, lng, status. - Location Service validates coordinates and checks driver status.
- If
status = available:GEOADD drivers:available <lng> <lat> <driver_id>. - If
status = matchedoroffline:ZREM drivers:available <driver_id>. - Location Service returns 200 OK.
Request walkthrough (driver discovery on ride request):
- Match Service receives a new
RideRequest. - Match Service calls
GEORADIUS drivers:available <pickup_lng> <pickup_lat> 2 km ASC COUNT 10. - Redis returns up to 10 driver IDs sorted by distance.
- Match Service stores these as the
candidate_queuefor this request.
This diagram adds the Location Service and Redis Geo index. The atomic matching step that prevents double-booking is the next section.
3. Driver matching with conflict prevention
Two riders can submit requests at the same millisecond and both receive the same nearest driver from GEORADIUS before either has claimed that driver. A naive check-then-set race corrupts every booking at high concurrency.
The naive approach runs two separate operations:
- Read driver status:
SELECT status FROM drivers WHERE driver_id = X - If available:
UPDATE drivers SET status = matched WHERE driver_id = X
Between steps 1 and 2, another Match Service instance can execute the same two operations for a different rider. Both instances see available and both believe they have claimed the driver. This is a TOCTOU bug that fires at 2K match requests per second.
The fix is a single atomic Redis command. GETDEL drivers:available_status:{driver_id} reads and deletes the key in one atomic step. Redis processes commands sequentially in a single thread, so only the instance that receives a non-nil response has claimed the driver.
Components:
- Match Service (evolved): Executes
GETDELon a per-driver status key instead of SELECT then UPDATE. - Redis Atomic Claim: A separate key namespace (
drivers:available_status:{driver_id}) holds a JSON blob with driver state. The Location Service writes this key when a driver becomes available; the Match Service deletes it when claiming. - Redis Offer Store: Tracks the active offer sent to each driver as a key with a 10-second TTL.
Request walkthrough:
- Match Service takes the first driver from the
candidate_queue. - Match Service executes
GETDEL drivers:available_status:{driver_id}. - If nil: driver was already claimed. Advance to the next candidate.
- If non-nil: driver is claimed. Write
offer:{offer_id}key withEX 10TTL to Redis. - Match Service pushes the offer to the Driver App via WebSocket.
- Match Service updates
RideRequeststatus tooffer_sentin Ride DB.
The GETDEL operation is O(1) and atomic under Redis's single-threaded command model. No distributed lock is needed. The next section handles driver response and the fallback path.
4. Driver response and fallback
On acceptance, both parties receive each other's contact info and the ride is confirmed. On decline or timeout, the system advances to the next candidate without the rider noticing the delay. In my experience, the fallback path fires more often than people expect: roughly 30-40% of first-choice drivers decline or time out during peak hours.
Components:
- Driver App: Sends
PUT /rides/{ride_id}/respondwithdecision: accept | decline. - Match Service: On acceptance, finalizes the ride and notifies the rider. On decline or timeout, pops the next candidate and repeats the atomic claim flow.
- Timeout Worker: A background process that scans for expired
offer:{offer_id}keys in Redis. When a TTL-10s offer key expires, the worker triggers fallback to the next candidate.
Request walkthrough (accept):
- Driver app sends
PUT /rides/{ride_id}/respondwithdecision: accept. - Match Service verifies the offer key still exists (not expired).
- Match Service writes
{ ride_id, driver_id, status: accepted }to Ride DB. - Match Service sends the rider the driver's name, phone, and ETA.
- Match Service deletes the offer key from Redis.
Request walkthrough (decline or timeout):
- Driver declines, or the 10-second TTL on
offer:{offer_id}expires. - Timeout Worker (on TTL expiry) or Match Service (on decline) pops the next driver ID from
candidate_queue. - If a candidate exists: repeat the atomic claim flow from step 2 of section 3.
- If no candidates remain: expand the search radius to 5km and re-run GEORADIUS.
- If still no candidates after 30 seconds: return
status: no_drivers_foundto the rider.
This covers the full round-trip matching flow. The final architecture diagram assembles all services at production scale with the full location ingestion pipeline.
Potential Deep Dives
1. How do we maintain a real-time geospatial index?
Drivers send GPS updates every 4 seconds. At 1M active drivers, that is 250K writes per second. The index must answer "nearest 10 available drivers within 2km" in under 50ms, with coordinates no more than 5 seconds stale.
2. How do we prevent double-booking a driver?
Two riders submit requests at the same millisecond. Both run GEORADIUS and both receive the same driver as the nearest candidate. Without atomic claiming, both proceed to send offers to the same driver. One ride gets double-booked. This is the most cited production incident in matching systems.
3. How do we handle 250K driver location writes per second?
Location freshness requires GPS updates every 4 seconds from every active driver. At 1M concurrent drivers, that is 250K writes per second, sustained, with no batching opportunity on the driver side.
4. How do we scale matching to millions of drivers globally?
A single Redis instance holds 5M GEORADIUS entries, but a data center in Virginia cannot serve a driver in Bangkok within the 200ms match latency target. Network round-trip from Southeast Asia to the US adds 120-200ms baseline before any query runs. I save this section for last in interviews because interviewers want to see that you solved correctness before you solved scale.
Final Architecture
The central architectural insight: the location write path (Driver App, Location Service, Kafka, Location Consumer, Redis H3 Index) and the match read path (Rider App, Match Service, Redis Atomic Claim) are completely decoupled. Every location write goes to Kafka first; the geospatial index lives exclusively in Redis, and the primary database is not on either hot path. This decoupling is what allows 250K location writes per second to coexist with sub-200ms match latency without one path degrading the other.
Interview Cheat Sheet
- A ride-matching system has two independent hot paths: the location write path (drivers broadcasting GPS at 250K writes/second) and the match read path (riders triggering geospatial queries at ~2K/second). Separate them into different services from the start. I open every ride-matching answer with this split because it frames every decision that follows.
- Store available drivers in a Redis H3 index at resolution 9 (approximately 0.1 km² per cell). Use
SADD drivers:cell:{h3_cell}for writes and query the k=1 ring (7 cells, ~0.7km radius) first before expanding. - Use tiered H3 ring expansion for match queries: search ring-k1 first, fall back to ring-k2 (~2km, 19 cells), then escalate to coarser resolution-7 cells (~5 km²) for sparse coverage areas.
- Prevent double-booking with Redis
GETDELon a per-driver status key (drivers:available_status:{driver_id}). Redis serializes commands in a single thread, so only one caller receives a non-nil result. O(1), sub-millisecond, no distributed lock required. - Drivers report GPS every 4 seconds. At 1M active drivers that is 250K writes per second. Route all location updates through Kafka and have a consumer group update only Redis. Keep the primary database off the location write hot path entirely.
- Use city-cell partitioning for global scale. Map each request to an H3 resolution-5 cell (~250 km²), consistent-hash that to a Redis shard, and build explicit cross-shard spillover for riders near city boundaries.
- Driver offer timeouts use Redis key TTLs. Set
offer:{offer_id}withEX 10. A Timeout Worker scans for expired keys and triggers fallback to the next candidate in the queue. - The candidate queue for each match request is the sorted list from H3 ring expansion. The Match Service pops from this list and attempts
GETDELon each candidate. If all candidates fail, widen the search radius. - Drivers are removed from the available index via the
GETDELthat claims them. No separate UPDATE statement is needed: deletion from Redis is the atomic claim. - On decline or timeout: pop the next candidate from the queue and repeat the GETDEL. Widen the search from 2km to 5km only after exhausting all candidates in the initial list.
- Match latency budget: H3 ring query under 5ms, GETDEL atomic claim under 1ms, notification push under 50ms. Total budget to confirmed offer delivery is under 200ms.
- PostgreSQL holds the durable audit trail (RideRequest, RideOffer, status transitions) but is not queried during the hot match path. The DB records are written after the match is confirmed, not during it.