Ride Matching
Design the driver dispatch engine that matches a rider's request to the nearest available driver in milliseconds, covering real-time geospatial indexing, conflict prevention, and the rebalancing challenges of a global fleet.
What is a ride-matching system?
A ride-matching system connects a rider's pickup request to the nearest available driver. The apparent simplicity hides two hard engineering problems: you need a real-time geospatial index that tracks millions of continuously moving drivers with sub-second freshness, and you need to atomically claim a driver the instant you select them so two riders can never be routed to the same car. I like this question in interviews because it forces you to solve geospatial indexing, atomic concurrency control, and high-throughput location ingestion all in one sitting.
Functional Requirements
Core Requirements
- A rider requests a ride at a given pickup location.
- The system finds the nearest available driver and sends them a request.
- If the driver accepts, both sides receive each other's contact information and ETA.
- If the driver declines or times out, the system tries the next nearest driver.
Scope Exclusions
- Routing and turn-by-turn navigation (covered in Design an ETA Service).
- Dynamic surge pricing (covered in Design a Surge Pricing System).
Non-Functional Requirements
Core Requirements
- Match latency: A matched driver is returned to the rider in under 200ms from the moment the ride request is received.
- Availability: 99.99% uptime. Availability over consistency: a driver location that is 3 seconds stale is acceptable, a failed match is not.
- Location freshness: Driver positions are current to within 5 seconds at all times. This sets the GPS update interval at 4 seconds per driver.
- Location write throughput: 1M concurrently active drivers each sending GPS updates every 4 seconds equals 250K location writes per second at peak.
- Consistency: Each driver is atomically claimed by exactly one rider. No double-booking, even under concurrent match requests.
- Scale: 5M registered drivers globally. 2K trip requests per second during peak periods.
Below the Line
- Real-time driver tracking during an active trip (separate WebSocket streaming problem)
- Trip lifecycle management and status persistence
- Fraud detection and driver eligibility filtering
Read/write ratio: Location writes dominate: 250K writes per second at peak versus roughly 2K trip-request queries per second. That is a 125:1 write-to-read ratio on the location path. Each match query is a geospatial scan (read-intensive per request) but low-rate relative to the location write firehose.
I'd call out this ratio early in an interview because it changes everything downstream. The write volume determines how the location ingestion pipeline is built; the query access pattern determines the geospatial index structure. They require separate services with separate scaling axes.
Real-time trip tracking is below the line because it does not touch the matching pipeline. To add it: open a WebSocket connection between both apps on trip acceptance and relay driver GPS updates through Redis Pub/Sub on a trip-specific channel. The Location Service publishes to that channel whenever it receives an update for a driver on an active trip.
Trip lifecycle management is below the line because it runs after the match is made. To add it: introduce a Trip Service that owns the state machine (requested, accepted, in_progress, completed) and persists each state transition to a PostgreSQL table with timestamps for audit and billing.
The hardest problem in scope: Atomically removing a driver from the available pool the moment they are matched, preventing two concurrent ride requests from being routed to the same driver. A naive SELECT-then-UPDATE is a TOCTOU race condition that fires constantly in production at this request rate. The atomic claim is where bugs hide.
Core Entities
- Rider: A registered user with a
rider_idwho can request rides. Pickup location is captured at request time, not stored as a persistent field. - Driver: A registered driver with
driver_id,status(available, matched, on_trip, offline), and a linked vehicle record. - DriverLocation: The current GPS snapshot for a driver:
driver_id,latitude,longitude,updated_at. Ephemeral, stored in Redis, not written to the primary DB on every update. - RideRequest: A pending match record with
request_id,rider_id,pickup_lat,pickup_lng,status(pending, offer_sent, matched, expired), and acandidate_queueof driver IDs to try in order. - RideOffer: An in-flight offer from the system to a specific driver:
offer_id,request_id,driver_id,sent_at,expires_at. Expires after 10 seconds if no driver response arrives.
Full schema details (indexes, partition keys, TTLs) are deferred to the deep dives. These five entities are sufficient to drive the API design and High-Level Design.
API Design
I keep the API surface minimal: two actors (rider, driver), four action types.
FR 1 - Rider requests a ride:
POST /rides
Body: { pickup_lat, pickup_lng }
Response: { ride_id, status: "matching" }
The rider gets a ride_id immediately and polls GET /rides/{ride_id} for a driver assignment. The server returns before a driver is found so match latency is invisible to the caller.
FR 2 - Driver broadcasts location (enables matching):
POST /drivers/location
Body: { latitude, longitude, status }
Response: 200 OK
Called every 4 seconds per active driver. The response body is empty; acknowledging receipt is sufficient. The service must sustain 250K concurrent callers at peak.
FR 3 and FR 4 - Driver responds to an offer:
PUT /rides/{ride_id}/respond
Body: { driver_id, decision: "accept" | "decline" }
Response: { status, rider_contact? }
One endpoint handles both accept and decline. The branching logic belongs in the service, not in the URL. On accept, rider_contact carries the rider's name and phone number.
Rider polls for match status:
GET /rides/{ride_id}
Response: { ride_id, status, driver?: { driver_id, name, phone, eta_seconds } }
Polling is sufficient for the HLD. In a production system I would evolve this to a WebSocket opened on the POST response, with the server pushing status changes as they occur.
High-Level Design
1. Rider requests a ride
The write path: the rider submits pickup coordinates, the Match Service creates a ride record, and returns a ride ID. Driver matching starts asynchronously.
Components:
- Rider App: Mobile client sending
POST /rides. - Match Service: Validates the request, creates the ride record, and queues the match attempt.
- Ride DB: Stores the authoritative
RideRequestrecord. Status begins aspending.
Request walkthrough:
- Rider app sends
POST /rideswith pickup lat/lng. - Match Service validates the coordinates (valid lat/lng range).
- Match Service inserts
{ request_id, rider_id, pickup_lat, pickup_lng, status: "pending" }into Ride DB. - Match Service returns
{ ride_id, status: "matching" }to the rider. - Rider begins polling
GET /rides/{ride_id}every 2 seconds.
This covers the write path only. The matching step that queries available drivers is in the next section.
2. Geospatial driver discovery
Drivers broadcast GPS every 4 seconds. The system must answer "which drivers are within 2km of this pickup?" in under 50ms to hold total match latency under 200ms.
A plain SQL range query scans the full drivers table on every request. At 1M active drivers, the optimizer picks one dimension and range-scans on the other even with a composite (lat, lng) index. I've seen this exact failure mode in production: the query plan looks fine in testing with 10K rows, then collapses under load when the table hits seven figures.
The fix is a geospatial index in Redis using GEOADD and GEORADIUS. Redis holds all available drivers in memory, and GEORADIUS returns nearby drivers in under 1ms for 1M entries. I'll treat the specific index structure (Geohash vs H3) as a black box here and cover it in the deep dives.
Components:
- Driver App: Mobile client calling
POST /drivers/locationevery 4 seconds. - Location Service: A separate service that accepts driver GPS updates and writes them to the geospatial index. Decoupled from the Match Service so location write throughput does not affect match latency.
- Redis Geo (Driver Index): Available drivers stored with geohash-encoded scores.
GEORADIUSreturns members within a given radius sorted by distance.
Request walkthrough (location update):
- Driver app sends
POST /drivers/locationwith lat, lng, status. - Location Service validates coordinates and checks driver status.
- If
status = available:GEOADD drivers:available <lng> <lat> <driver_id>. - If
status = matchedoroffline:ZREM drivers:available <driver_id>. - Location Service returns 200 OK.
Request walkthrough (driver discovery on ride request):
- Match Service receives a new
RideRequest. - Match Service calls
GEORADIUS drivers:available <pickup_lng> <pickup_lat> 2 km ASC COUNT 10. - Redis returns up to 10 driver IDs sorted by distance.
- Match Service stores these as the
candidate_queuefor this request.
This diagram adds the Location Service and Redis Geo index. The atomic matching step that prevents double-booking is the next section.
3. Driver matching with conflict prevention
Two riders can submit requests at the same millisecond and both receive the same nearest driver from GEORADIUS before either has claimed that driver. A naive check-then-set race corrupts every booking at high concurrency.
The naive approach runs two separate operations:
- Read driver status:
SELECT status FROM drivers WHERE driver_id = X - If available:
UPDATE drivers SET status = matched WHERE driver_id = X
Between steps 1 and 2, another Match Service instance can execute the same two operations for a different rider. Both instances see available and both believe they have claimed the driver. This is a TOCTOU bug that fires at 2K match requests per second.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.