Design Uber

What is Uber?

Uber is a ride-hailing platform that connects riders who need a trip with nearby drivers. The apparent core is simple: request a ride, match a driver, complete the trip. The hard part is underneath.

The system must continuously track millions of driver GPS coordinates, answer "who is closest to this pickup?" in under 100ms, stream location updates between two strangers in near real-time, and do all of this for hundreds of thousands of concurrent trips without losing a single location update. I start every Uber design by separating the location write path (drivers broadcasting GPS) from the matching read path (riders triggering geospatial queries), because they have almost nothing in common and mixing them creates the worst bottlenecks at scale.

It tests geospatial indexing, real-time streaming at scale, event-driven matching, and atomic concurrency control, making it one of the most concept-dense questions in the interview circuit.

Functional Requirements

Core Requirements

Riders can request a trip by specifying a pickup and dropoff location.
Drivers continuously broadcast their GPS location to the system.
The system matches a requesting rider to the nearest available driver.
Riders and drivers can see each other's real-time location during an active trip.

Below the Line (out of scope)

Payments and invoicing
Ratings and reviews
Scheduled rides and ride types (Pool, Comfort, Black)
Driver onboarding and background checks

The hardest part in scope: geospatial matching. Every ride request triggers a query across millions of GPS coordinates to find the nearest available driver. The naive approach (SQL range query on lat/lng columns) collapses at scale. Efficient geospatial indexing is the central engineering problem this article solves.

Payments are below the line because the payment flow (charge, refund, driver payout) runs after trip completion and does not share any infrastructure with matching or location tracking. To add it, I would integrate a payment processor like Stripe and publish a TripCompletedEvent from the Trip Service to a dedicated payments Kafka topic. A Payments Service consumes this event, calculates the fare using trip distance and any surge multiplier, and executes the charge asynchronously.

Ratings and reviews are below the line because they are a separate write-after-trip flow that does not affect the hot paths. To add them, I would store ratings in a Postgres table keyed by (trip_id, rater_id) and compute rolling rating averages in a background job rather than inline.

Scheduled rides require a separate scheduling layer. To add them, I would store scheduled trip requests in a persistent job queue and release them into the normal matching flow shortly before the scheduled pickup time, reusing the matching infrastructure entirely.

Non-Functional Requirements

Core Requirements

Availability: 99.99% uptime. Availability over consistency for matching; a mildly stale driver location is acceptable, a failed match is not.
Match latency: Rider receives a driver assignment within 5 seconds of requesting a trip.
Location freshness: Driver locations are current to within 5 seconds at all times.
Location write throughput: Support 1M concurrently active drivers, each sending GPS updates every 4 seconds. That is 250K location writes per second at peak.
Scale: 5M registered drivers, 15M trips per day. Peak matching throughput of approximately 500 trip requests per second during surge.

Below the Line

Sub-second GPS update propagation to rider app during trip
Surge pricing computation (important but not part of functional core)

Read/write ratio: Location writes (driver GPS broadcasts) are the dominant workload: 250K writes per second at peak. Trip requests (matching reads) are orders of magnitude lower: ~500 per second peak. But each trip request triggers a geospatial query across 1M+ driver positions. The write volume shapes the location storage architecture; the read access pattern shapes the geospatial index. They pull in different directions, and that tension drives every major design decision in this article.

I target a 5-second match time because GPS freshness means a 4-second update interval is already baked in. Anything beyond one cycle means the system is stalling on matching logic, not waiting for fresh data. The sub-5-second target rules out any matching approach that requires multiple sequential round-trips to the database with no caching.

Core Entities

Driver: A registered driver with a current location (latitude, longitude), status (available, on_trip, offline), vehicle details, and a driver_id. The status field gates every matching query.
Rider: A registered user with a rider_id who can place trip requests.
Trip: A request-to-completion record with trip_id, rider_id, driver_id, pickup_location, dropoff_location, status (requested, accepted, in_progress, completed, cancelled), and created_at.
DriverLocation: The current GPS snapshot for a driver: driver_id, latitude, longitude, updated_at. Ephemeral; not a durable historical record.

The schema details (indexes, partition keys, TTLs) are deferred to the deep dives. These four entities are sufficient to drive the API design and High-Level Design.

API Design

I keep the API surface minimal. Two actors (rider, driver) with four distinct action types.

Rider requests a trip:

POST /trips
Body: { pickup_lat, pickup_lng, dropoff_lat, dropoff_lng }
Response: { trip_id, status: "requested", estimated_wait_seconds }

Rider gets trip status and driver location:

GET /trips/{trip_id}
Response: { trip_id, status, driver: { lat, lng, eta_seconds } }

Driver broadcasts location:

POST /drivers/location
Body: { latitude, longitude, status }
Response: 200 OK

Driver accepts a trip offer:

PUT /trips/{trip_id}/accept
Response: { trip_id, pickup_location, rider_name }

Driver updates trip status:

PUT /trips/{trip_id}/status
Body: { status: "in_progress" | "completed" | "cancelled" }
Response: 200 OK

Rider subscribes to real-time driver location:

GET /trips/{trip_id}/live
Upgrade: websocket
Server pushes: { driver_lat, driver_lng, timestamp_ms }
Connection closes when trip status reaches "completed" or "cancelled"

Why HTTP for location updates? At 250K location writes per second, HTTP/2 keep-alive connections amortize connection overhead across many requests. A short-lived HTTP POST per update adds roughly 5ms of latency but keeps the driver app stateless: no persistent WebSocket connection to maintain on mobile networks that regularly drop and reconnect. The alternative (persistent WebSocket from driver to server) reduces per-update overhead but complicates reconnection logic on unreliable mobile connections. I choose HTTP for the driver write path and WebSocket for the rider receive path, since riders need server-pushed updates without polling.

High-Level Design

1. Rider requests a trip

The write path: the rider submits a pickup/dropoff pair, the Trip Service creates a trip record with status requested, and immediately kicks off asynchronous driver matching. The rider receives a trip_id back without waiting for a driver to accept.

Components:

Rider App: Mobile client sending the trip request.
Trip Service: Validates the request, creates the trip record, and publishes a TripRequestedEvent for the matching pipeline to consume.
Trip DB: Stores the authoritative trip record. Status progresses from requested through accepted, in_progress, to completed.

Request walkthrough:

Rider app sends POST /trips with pickup and dropoff coordinates.
Trip Service validates the locations (valid lat/lng range, reachable geocoordinate).
Trip Service inserts { trip_id, rider_id, pickup_location, dropoff_location, status: "requested", created_at } into Trip DB.
Trip Service records the trip request in the surge demand index: ZADD trip_requests:cell:{geohash5(pickup_lat, pickup_lng)} {timestamp_ms} {trip_id} on Redis (consumed by the Surge Worker in deep dive 4).
Trip Service publishes TripRequestedEvent { trip_id, pickup_lat, pickup_lng } to Kafka.
Trip Service returns { trip_id, status: "requested" } to the rider.

flowchart LR
  RA(["👤 Rider App\nMobile client"])
  TS["⚙️ Trip Service\nValidate · INSERT trip (requested)\nPublish TripRequestedEvent\nZADD surge demand index"]
  TDB[("🗄️ Trip DB\ntrip_id · rider_id · pickup\ndropoff · status = requested")]
  KF["📨 Kafka\nTripRequestedEvent\npickup_lat · pickup_lng · trip_id"]

  RA -->|"POST /trips · pickup · dropoff"| TS
  TS -->|"INSERT trip row"| TDB
  TS -->|"Publish TripRequestedEvent"| KF
  TS -->|"{ trip_id, status: requested }"| RA

The matching step that consumes the Kafka event is deferred to requirement 3. For now the trip exists in the database, the rider has a trip_id, and the matching pipeline has the event it needs.

2. Drivers broadcast their GPS location

The write path for driver location is entirely separate from the trip request path. Drivers send a GPS update every 4 seconds regardless of whether they are available, on a trip, or transitioning between states. These updates flow into two destinations: a geospatial index for matching queries, and a real-time channel for active-trip tracking.

Components:

Driver App: Mobile client sending periodic GPS updates.
Location Service: Receives driver location updates and writes them to the geospatial index for available-driver queries. When the driver is on an active trip, it publishes GPS positions to Redis Pub/Sub instead (covered in requirement 4).
Redis Geo (Location Store): A geospatially indexed Redis sorted set. Available drivers are stored here permanently until they accept a trip or go offline.

Request walkthrough:

Driver app sends POST /drivers/location with current lat/lng and status.
Location Service validates the coordinates and driver status.
If status = available: Location Service calls GEOADD drivers:available <lng> <lat> <driver_id> on Redis.
If status = on_trip or status = offline: Location Service calls ZREM drivers:available <driver_id> to remove from the geospatial index.
Location Service returns 200 OK.

flowchart LR
  DA(["🚗 Driver App\nMobile client\nGPS update every 4s"])
  LS["⚙️ Location Service\nValidate · Write to Redis Geo\nGEOADD (available) or ZREM (off_trip)"]
  RG["⚡ Redis Geo\ndrivers:available\nGeospatially indexed sorted set\nLng/lat per driver_id"]

  DA -->|"POST /drivers/location · lat · lng · status"| LS
  LS -->|"GEOADD or ZREM based on status"| RG
  LS -->|"200 OK"| DA

This diagram covers only the geospatial index write path. Active-trip location streaming via Redis Pub/Sub is introduced in requirement 4. The Redis Geo index is the critical structure: it receives 250K writes per second at peak and must answer "find all available drivers within 5km of this point" in under 10ms. The deep dives address how this scales and what geospatial indexing strategy actually underpins it.

3. System matches rider to nearest available driver

Matching consumes the TripRequestedEvent from Kafka, queries the Redis Geo index for nearby drivers, and assigns the trip to the first driver who accepts. The entire flow is asynchronous from the rider's perspective: the rider polls (or receives a WebSocket push) to learn when a driver is assigned.

Components:

Match Worker: Kafka consumer that processes TripRequestedEvent messages. Queries Redis Geo and dispatches offers to candidate drivers.
Redis Geo (from requirement 2): Answers geospatial proximity queries for available drivers.
Notification Service: Pushes trip offers to specific driver apps (via APNs/FCM push notification or driver WebSocket connection).
Trip Service (updated): Handles PUT /trips/{trip_id}/accept from drivers, atomically assigns the trip, and updates Trip DB.

Request walkthrough:

Match Worker consumes TripRequestedEvent { trip_id, pickup_lat, pickup_lng } from Kafka.
Match Worker calls GEOSEARCH drivers:available FROMLONLAT <pickup_lng> <pickup_lat> BYRADIUS 5 km ASC COUNT 5 on Redis.
Match Worker marks the top 5 candidate drivers as pending_offer (atomic Redis SET with 30-second TTL per driver).
Notification Service pushes a trip offer to each of the 5 candidate drivers simultaneously.
First driver app sends PUT /trips/{trip_id}/accept.
Trip Service executes a compare-and-swap update: UPDATE trips SET status='accepted', driver_id=? WHERE trip_id=? AND status='requested'. The row count indicates whether this driver won the race.
Trip Service removes the assigned driver from drivers:available via ZREM.
Trip Service pushes a "driver assigned" notification to the rider.

flowchart LR
  KF["📨 Kafka\nTripRequestedEvent\npickup location · trip_id"]
  MW["⚙️ Match Worker\nGEOSEARCH nearby drivers\nDispatch offers to top 5\nAtomic offer lock (30s TTL)"]
  RG["⚡ Redis Geo\ndrivers:available\nGEOSEARCH by radius"]
  NS["⚙️ Notification Service\nAPNs / FCM push or WebSocket\nSimultaneous offer to top 5 candidates"]

  DA(["🚗 Driver App\nMobile client"])
  TS["⚙️ Trip Service\nPUT /trips/{id}/accept\nAtomic assignment · ZREM from available set"]
  TDB[("🗄️ Trip DB\nstatus: requested → accepted\ndriver_id populated")]
  RA(["👤 Rider App\nDriver assigned notification"])

  KF -->|"Consume TripRequestedEvent"| MW
  MW -->|"GEOSEARCH within 5km"| RG
  RG -->|"Sorted list of up to 5 driver_ids"| MW
  MW -->|"Dispatch offer concurrently"| NS
  NS -->|"Push offer to each driver"| DA
  DA -->|"PUT /trips/{trip_id}/accept"| TS
  TS -->|"Atomic: UPDATE status=accepted WHERE status=requested"| TDB
  TS -->|"ZREM drivers:available <driver_id>"| RG
  TS -->|"Push: driver assigned"| RA

The atomic UPDATE WHERE status = requested on Trip DB is the correctness guarantee: if two drivers accept simultaneously, only the first one that wins the DB update gets the trip. The other driver receives a "trip already claimed" response and is returned to available status.

If no driver accepts within 30 seconds, the Match Worker re-queries Redis Geo for a fresh set of candidates (drivers may have moved closer) and repeats the offer dispatch. After three attempts, the trip is marked no_driver_found and the rider is notified to retry. This retry loop must be a scheduled job, not a busy-wait loop, to avoid blocking the worker thread.

4. Real-time location tracking during a trip

Once a trip is accepted, both rider and driver need the driver's GPS position updated in near real-time throughout the trip. Polling from the rider app (repeated GET requests every 2 seconds) creates a thundering herd: at 500K concurrent trips, 2-second polling generates 250K HTTP requests per second from riders alone. WebSocket connections eliminate polling entirely; the server pushes updates to the rider as they arrive.

Components:

Location Streaming Service: Maintains long-lived WebSocket connections to both rider and driver apps for active trips. Routes driver location updates to the correct rider connection.
Redis Pub/Sub: Message bus per active trip. When the Location Service receives a driver GPS update for an active trip, it publishes to channel trip:{trip_id}:location. The Location Streaming Service node holding the rider's connection subscribes to this channel.
Location Service (updated): On receiving a driver location update for a driver with status = on_trip, publishes to Redis Pub/Sub in addition to skipping the geospatial index write.

Request walkthrough:

On trip acceptance, the rider app opens a WebSocket connection to the Location Streaming Service.
Location Streaming Service subscribes to Redis channel trip:{trip_id}:location.
Driver app continues sending POST /drivers/location (now with status = on_trip).
Location Service publishes { driver_id, lat, lng, timestamp } to trip:{trip_id}:location.
Location Streaming Service node holding the rider's WebSocket receives the published event and pushes it to the rider.
On trip completion, Location Streaming Service unsubscribes from the channel and closes the WebSocket.

flowchart LR
  DA(["🚗 Driver App\nGPS update every 4s · status=on_trip"])
  LS["⚙️ Location Service\nReceive GPS update\nPublish to Redis Pub/Sub (trip channel)"]
  RP["⚡ Redis Pub/Sub\nChannel: trip:{trip_id}:location\nDriver GPS event per update"]
  LSS["⚙️ Location Streaming Service\nSubscribes to trip channel on WebSocket open\nPushes driver GPS to rider WebSocket"]
  RA(["👤 Rider App\nReceives driver location updates\nUpdates map in real time"])

  DA -->|"POST /drivers/location · status=on_trip"| LS
  LS -->|"PUBLISH trip:{trip_id}:location"| RP
  RP -->|"Event delivered to subscriber"| LSS
  LSS -->|"WebSocket push: driver lat/lng"| RA

Redis Pub/Sub delivers messages to all subscribers on a channel. When there are multiple Location Streaming Service nodes (the typical production case), every node subscribes to every channel it has an active WebSocket for. A driver's location update reaches only the node holding that trip's rider connection. This is a fan-out of one at the subscriber level, not across all nodes.

Potential Deep Dives

1. How do we index driver locations for efficient geospatial queries?

Three constraints drive the design:

The geospatial index must hold up to 1M concurrent available driver positions.
A GEOSEARCH call must return the nearest N drivers within a given radius in under 10ms.
The index receives 250K updates per second at peak. Every location write from an available driver is a write to this index.

2. How do we stream driver location to the rider in real time?

Three constraints drive the design:

The rider app must see the driver's GPS position update within 5 seconds of the driver moving.
At 500K concurrent active trips, position updates must scale to 500K pushes per second to rider apps.
Mobile network conditions are unreliable. The delivery mechanism must survive brief disconnections without losing updates.

3. How do we handle matching at scale?

Three constraints drive the design:

Match must complete within 5 seconds of the trip request.
At 500 trip requests per second peak, the matching pipeline cannot introduce serial bottlenecks.
A driver must never receive two simultaneous trip offers, and a trip must never be assigned to two drivers.

4. How do we implement surge pricing?

Three constraints drive the design:

Surge multipliers must reflect current supply and demand conditions, not data that is minutes old.
Computing surge inline on every trip request (a real-time aggregation at request time) cannot add meaningful latency to the match path.
The surge multiplier for a given area must be consistent across all Trip Service instances. Two riders requesting a trip from the same block at the same second must see the same price.

Final Architecture

flowchart LR
  subgraph Clients["👤 Client Layer"]
    direction TB
    RA(["👤 Rider App\nMobile / web"])
    DA(["🚗 Driver App\nMobile"])
  end

  subgraph Gateway["🔀 Gateway Layer"]
    AG["🔀 API Gateway\nAuth · rate limiting · TLS\nRoutes to Trip, Location, Streaming"]
  end

  subgraph AppTier["⚙️ Application Tier"]
    direction TB
    TS["⚙️ Trip Service\nCreate · Accept\nSurge lookup"]
    LS["⚙️ Location Service\nGPS ingestion\nGEOADD/ZREM · PUBLISH"]
    LSS["⚙️ Location Streaming\nWebSocket per trip\nSub to Redis Pub/Sub"]
  end

  subgraph AsyncTier["📨 Async Pipeline"]
    KF["📨 Kafka\nTripRequested · Accepted\nCompleted · at-least-once"]
    MW["⚙️ Match Worker\nKafka consumer\nGEOSEARCH · parallel dispatch"]
    SW["⚙️ Surge Worker\nBackground · every 30s\nWrites surge:{cell}"]
  end

  subgraph CacheTier["⚡ Cache Tier"]
    direction TB
    RG["⚡ Redis Geo Cluster\ndrivers:available\nPartitioned by region"]
    RP["⚡ Redis Pub/Sub\ntrip:{id}:location channels\nEphemeral delivery"]
    RS["⚡ Redis Surge Cache\nsurge:{geohash5}\n60s TTL · sub-ms reads"]
  end

  subgraph DBTier["🗄️ Database Tier"]
    direction TB
    TDB[("🟢 Trip DB (Postgres)\ntrip_id · rider_id · driver_id\nstatus · pickup · dropoff")]
    TDR[("🔵 Read Replica\nRead-only · async replication")]
  end

  RA -->|"POST /trips · GET /trips/{id}"| AG
  DA -->|"POST /location · PUT /accept"| AG
  AG -->|"Trip requests"| TS
  AG -->|"GPS updates"| LS
  AG -->|"WebSocket upgrade"| LSS
  TS -->|"INSERT trip"| TDB
  TS -->|"GET surge:{cell}"| RS
  TS -->|"TripRequestedEvent"| KF
  LS -->|"GEOADD / ZREM"| RG
  LS -->|"PUBLISH location"| RP
  RP -->|"location event"| LSS
  LSS -->|"WS push · lat/lng"| RA
  KF -->|"Consume event"| MW
  MW -->|"GEOSEARCH within radius"| RG
  MW -->|"Offer dispatch"| TS
  TS -->|"atomic UPDATE status"| TDB
  TDB -.->|"Async replication"| TDR
  SW -.->|"GEOSEARCH supply per cell"| RG
  SW -.->|"SETEX surge:{cell} every 30s"| RS

The architecture has three distinct data planes. The location ingestion plane (Driver App to Location Service to Redis Geo) handles 250K writes per second and is entirely in-memory. The matching plane (Kafka to Match Worker to Redis Geo to Trip DB) handles 500 transactions per second and is the only plane that touches the relational database on the critical path. The streaming plane (Location Service to Redis Pub/Sub to Location Streaming Service to Rider App) handles 500K concurrent connections and is stateless except for the Redis Pub/Sub channels.

Interview Cheat Sheet

Start by separating the location write path (drivers broadcasting GPS) from the matching read path (riders triggering geospatial queries). They have nothing in common and scale differently.
State the dominant numbers early: 1M concurrent available drivers, 250K location writes per second, 500 trip requests per second peak. The ratio shapes the architecture.
Redis GEOSEARCH is the right tool for driver proximity queries: O(N+log M) per query, sub-millisecond for typical result sets, no SQL join or distance function required.
Partition Redis geo sets by geographic region (US-East, EU-West, APAC), not by driver_id hash. A radius query must land in one shard; hashing destroys geographic locality.
Parallel offer dispatch to top 5 candidates beats sequential offers: match latency equals the fastest willing driver, not the sum of timeouts.
Use UPDATE status = 'accepted' WHERE status = 'requested' as the atomic claim. No distributed lock needed. Zero rows updated means another driver won.
WebSocket with Redis Pub/Sub fan-out eliminates polling. One Redis channel per active trip; one subscription per rider WebSocket connection. Fan-out is one-to-one per trip.
Sticky routing by trip_id at the WebSocket load balancer means the node holding the rider's WebSocket is the same node subscribed to the Redis Pub/Sub channel. No cross-node messaging.
Surge pricing belongs in a pre-computed cache, not inline computation per request. A 30-second background worker computes per-geohash multipliers and writes them with 60-second TTL. Trip requests do one Redis GET.
Ghost driver cleanup: any driver with a location update older than 60 seconds should be evicted from the geo index by a background sweeper. Stale entries break matching by offering trips to unreachable drivers.
The Trip DB handles approximately 2,000 writes per second at peak (500 trip creations plus ~1,500 status updates). A single Postgres primary with a connection pool handles this comfortably; the database is not the bottleneck.
On driver app reconnect after a network drop, the next GPS POST re-inserts the driver into the geo index automatically. No special reconnect protocol is needed on the server side.

What is Uber?

Uber is a ride-hailing platform that connects riders who need a trip with nearby drivers. The apparent core is simple: request a ride, match a driver, complete the trip. The hard part is underneath.

It tests geospatial indexing, real-time streaming at scale, event-driven matching, and atomic concurrency control, making it one of the most concept-dense questions in the interview circuit.

Functional Requirements

Core Requirements

Riders can request a trip by specifying a pickup and dropoff location.
Drivers continuously broadcast their GPS location to the system.
The system matches a requesting rider to the nearest available driver.
Riders and drivers can see each other's real-time location during an active trip.

Below the Line (out of scope)

Payments and invoicing
Ratings and reviews
Scheduled rides and ride types (Pool, Comfort, Black)
Driver onboarding and background checks

The hardest part in scope: geospatial matching. Every ride request triggers a query across millions of GPS coordinates to find the nearest available driver. The naive approach (SQL range query on lat/lng columns) collapses at scale. Efficient geospatial indexing is the central engineering problem this article solves.

Non-Functional Requirements

Core Requirements

Availability: 99.99% uptime. Availability over consistency for matching; a mildly stale driver location is acceptable, a failed match is not.
Match latency: Rider receives a driver assignment within 5 seconds of requesting a trip.
Location freshness: Driver locations are current to within 5 seconds at all times.
Location write throughput: Support 1M concurrently active drivers, each sending GPS updates every 4 seconds. That is 250K location writes per second at peak.
Scale: 5M registered drivers, 15M trips per day. Peak matching throughput of approximately 500 trip requests per second during surge.

Below the Line

Sub-second GPS update propagation to rider app during trip
Surge pricing computation (important but not part of functional core)

Read/write ratio: Location writes (driver GPS broadcasts) are the dominant workload: 250K writes per second at peak. Trip requests (matching reads) are orders of magnitude lower: ~500 per second peak. But each trip request triggers a geospatial query across 1M+ driver positions. The write volume shapes the location storage architecture; the read access pattern shapes the geospatial index. They pull in different directions, and that tension drives every major design decision in this article.

Core Entities

Driver: A registered driver with a current location (latitude, longitude), status (available, on_trip, offline), vehicle details, and a driver_id. The status field gates every matching query.
Rider: A registered user with a rider_id who can place trip requests.
Trip: A request-to-completion record with trip_id, rider_id, driver_id, pickup_location, dropoff_location, status (requested, accepted, in_progress, completed, cancelled), and created_at.
DriverLocation: The current GPS snapshot for a driver: driver_id, latitude, longitude, updated_at. Ephemeral; not a durable historical record.

The schema details (indexes, partition keys, TTLs) are deferred to the deep dives. These four entities are sufficient to drive the API design and High-Level Design.

API Design

I keep the API surface minimal. Two actors (rider, driver) with four distinct action types.

Rider requests a trip:

POST /trips
Body: { pickup_lat, pickup_lng, dropoff_lat, dropoff_lng }
Response: { trip_id, status: "requested", estimated_wait_seconds }

Rider gets trip status and driver location:

GET /trips/{trip_id}
Response: { trip_id, status, driver: { lat, lng, eta_seconds } }

Driver broadcasts location:

POST /drivers/location
Body: { latitude, longitude, status }
Response: 200 OK

Driver accepts a trip offer:

PUT /trips/{trip_id}/accept
Response: { trip_id, pickup_location, rider_name }

Driver updates trip status:

PUT /trips/{trip_id}/status
Body: { status: "in_progress" | "completed" | "cancelled" }
Response: 200 OK

Rider subscribes to real-time driver location:

GET /trips/{trip_id}/live
Upgrade: websocket
Server pushes: { driver_lat, driver_lng, timestamp_ms }
Connection closes when trip status reaches "completed" or "cancelled"

Why HTTP for location updates? At 250K location writes per second, HTTP/2 keep-alive connections amortize connection overhead across many requests. A short-lived HTTP POST per update adds roughly 5ms of latency but keeps the driver app stateless: no persistent WebSocket connection to maintain on mobile networks that regularly drop and reconnect. The alternative (persistent WebSocket from driver to server) reduces per-update overhead but complicates reconnection logic on unreliable mobile connections. I choose HTTP for the driver write path and WebSocket for the rider receive path, since riders need server-pushed updates without polling.

High-Level Design

1. Rider requests a trip

Components:

Rider App: Mobile client sending the trip request.
Trip Service: Validates the request, creates the trip record, and publishes a TripRequestedEvent for the matching pipeline to consume.
Trip DB: Stores the authoritative trip record. Status progresses from requested through accepted, in_progress, to completed.

Request walkthrough:

Rider app sends POST /trips with pickup and dropoff coordinates.
Trip Service validates the locations (valid lat/lng range, reachable geocoordinate).
Trip Service inserts { trip_id, rider_id, pickup_location, dropoff_location, status: "requested", created_at } into Trip DB.
Trip Service records the trip request in the surge demand index: ZADD trip_requests:cell:{geohash5(pickup_lat, pickup_lng)} {timestamp_ms} {trip_id} on Redis (consumed by the Surge Worker in deep dive 4).
Trip Service publishes TripRequestedEvent { trip_id, pickup_lat, pickup_lng } to Kafka.
Trip Service returns { trip_id, status: "requested" } to the rider.

flowchart LR
  RA(["👤 Rider App\nMobile client"])
  TS["⚙️ Trip Service\nValidate · INSERT trip (requested)\nPublish TripRequestedEvent\nZADD surge demand index"]
  TDB[("🗄️ Trip DB\ntrip_id · rider_id · pickup\ndropoff · status = requested")]
  KF["📨 Kafka\nTripRequestedEvent\npickup_lat · pickup_lng · trip_id"]

  RA -->|"POST /trips · pickup · dropoff"| TS
  TS -->|"INSERT trip row"| TDB
  TS -->|"Publish TripRequestedEvent"| KF
  TS -->|"{ trip_id, status: requested }"| RA

The matching step that consumes the Kafka event is deferred to requirement 3. For now the trip exists in the database, the rider has a trip_id, and the matching pipeline has the event it needs.

2. Drivers broadcast their GPS location

Components:

Driver App: Mobile client sending periodic GPS updates.
Location Service: Receives driver location updates and writes them to the geospatial index for available-driver queries. When the driver is on an active trip, it publishes GPS positions to Redis Pub/Sub instead (covered in requirement 4).
Redis Geo (Location Store): A geospatially indexed Redis sorted set. Available drivers are stored here permanently until they accept a trip or go offline.

Request walkthrough:

Driver app sends POST /drivers/location with current lat/lng and status.
Location Service validates the coordinates and driver status.
If status = available: Location Service calls GEOADD drivers:available <lng> <lat> <driver_id> on Redis.
If status = on_trip or status = offline: Location Service calls ZREM drivers:available <driver_id> to remove from the geospatial index.
Location Service returns 200 OK.

flowchart LR
  DA(["🚗 Driver App\nMobile client\nGPS update every 4s"])
  LS["⚙️ Location Service\nValidate · Write to Redis Geo\nGEOADD (available) or ZREM (off_trip)"]
  RG["⚡ Redis Geo\ndrivers:available\nGeospatially indexed sorted set\nLng/lat per driver_id"]

  DA -->|"POST /drivers/location · lat · lng · status"| LS
  LS -->|"GEOADD or ZREM based on status"| RG
  LS -->|"200 OK"| DA

3. System matches rider to nearest available driver

Components:

Match Worker: Kafka consumer that processes TripRequestedEvent messages. Queries Redis Geo and dispatches offers to candidate drivers.
Redis Geo (from requirement 2): Answers geospatial proximity queries for available drivers.
Notification Service: Pushes trip offers to specific driver apps (via APNs/FCM push notification or driver WebSocket connection).
Trip Service (updated): Handles PUT /trips/{trip_id}/accept from drivers, atomically assigns the trip, and updates Trip DB.

Request walkthrough:

Match Worker consumes TripRequestedEvent { trip_id, pickup_lat, pickup_lng } from Kafka.
Match Worker calls GEOSEARCH drivers:available FROMLONLAT <pickup_lng> <pickup_lat> BYRADIUS 5 km ASC COUNT 5 on Redis.
Match Worker marks the top 5 candidate drivers as pending_offer (atomic Redis SET with 30-second TTL per driver).
Notification Service pushes a trip offer to each of the 5 candidate drivers simultaneously.
First driver app sends PUT /trips/{trip_id}/accept.
Trip Service executes a compare-and-swap update: UPDATE trips SET status='accepted', driver_id=? WHERE trip_id=? AND status='requested'. The row count indicates whether this driver won the race.
Trip Service removes the assigned driver from drivers:available via ZREM.
Trip Service pushes a "driver assigned" notification to the rider.

flowchart LR
  KF["📨 Kafka\nTripRequestedEvent\npickup location · trip_id"]
  MW["⚙️ Match Worker\nGEOSEARCH nearby drivers\nDispatch offers to top 5\nAtomic offer lock (30s TTL)"]
  RG["⚡ Redis Geo\ndrivers:available\nGEOSEARCH by radius"]
  NS["⚙️ Notification Service\nAPNs / FCM push or WebSocket\nSimultaneous offer to top 5 candidates"]

  DA(["🚗 Driver App\nMobile client"])
  TS["⚙️ Trip Service\nPUT /trips/{id}/accept\nAtomic assignment · ZREM from available set"]
  TDB[("🗄️ Trip DB\nstatus: requested → accepted\ndriver_id populated")]
  RA(["👤 Rider App\nDriver assigned notification"])

  KF -->|"Consume TripRequestedEvent"| MW
  MW -->|"GEOSEARCH within 5km"| RG
  RG -->|"Sorted list of up to 5 driver_ids"| MW
  MW -->|"Dispatch offer concurrently"| NS
  NS -->|"Push offer to each driver"| DA
  DA -->|"PUT /trips/{trip_id}/accept"| TS
  TS -->|"Atomic: UPDATE status=accepted WHERE status=requested"| TDB
  TS -->|"ZREM drivers:available <driver_id>"| RG
  TS -->|"Push: driver assigned"| RA

4. Real-time location tracking during a trip

Components:

Location Streaming Service: Maintains long-lived WebSocket connections to both rider and driver apps for active trips. Routes driver location updates to the correct rider connection.
Redis Pub/Sub: Message bus per active trip. When the Location Service receives a driver GPS update for an active trip, it publishes to channel trip:{trip_id}:location. The Location Streaming Service node holding the rider's connection subscribes to this channel.
Location Service (updated): On receiving a driver location update for a driver with status = on_trip, publishes to Redis Pub/Sub in addition to skipping the geospatial index write.

Request walkthrough:

On trip acceptance, the rider app opens a WebSocket connection to the Location Streaming Service.
Location Streaming Service subscribes to Redis channel trip:{trip_id}:location.
Driver app continues sending POST /drivers/location (now with status = on_trip).
Location Service publishes { driver_id, lat, lng, timestamp } to trip:{trip_id}:location.
Location Streaming Service node holding the rider's WebSocket receives the published event and pushes it to the rider.
On trip completion, Location Streaming Service unsubscribes from the channel and closes the WebSocket.

flowchart LR
  DA(["🚗 Driver App\nGPS update every 4s · status=on_trip"])
  LS["⚙️ Location Service\nReceive GPS update\nPublish to Redis Pub/Sub (trip channel)"]
  RP["⚡ Redis Pub/Sub\nChannel: trip:{trip_id}:location\nDriver GPS event per update"]
  LSS["⚙️ Location Streaming Service\nSubscribes to trip channel on WebSocket open\nPushes driver GPS to rider WebSocket"]
  RA(["👤 Rider App\nReceives driver location updates\nUpdates map in real time"])

  DA -->|"POST /drivers/location · status=on_trip"| LS
  LS -->|"PUBLISH trip:{trip_id}:location"| RP
  RP -->|"Event delivered to subscriber"| LSS
  LSS -->|"WebSocket push: driver lat/lng"| RA

Potential Deep Dives

1. How do we index driver locations for efficient geospatial queries?

Three constraints drive the design:

The geospatial index must hold up to 1M concurrent available driver positions.
A GEOSEARCH call must return the nearest N drivers within a given radius in under 10ms.
The index receives 250K updates per second at peak. Every location write from an available driver is a write to this index.

2. How do we stream driver location to the rider in real time?

Three constraints drive the design:

The rider app must see the driver's GPS position update within 5 seconds of the driver moving.
At 500K concurrent active trips, position updates must scale to 500K pushes per second to rider apps.
Mobile network conditions are unreliable. The delivery mechanism must survive brief disconnections without losing updates.

3. How do we handle matching at scale?

Three constraints drive the design:

Match must complete within 5 seconds of the trip request.
At 500 trip requests per second peak, the matching pipeline cannot introduce serial bottlenecks.
A driver must never receive two simultaneous trip offers, and a trip must never be assigned to two drivers.

4. How do we implement surge pricing?

Three constraints drive the design:

Surge multipliers must reflect current supply and demand conditions, not data that is minutes old.
Computing surge inline on every trip request (a real-time aggregation at request time) cannot add meaningful latency to the match path.
The surge multiplier for a given area must be consistent across all Trip Service instances. Two riders requesting a trip from the same block at the same second must see the same price.

Final Architecture

flowchart LR
  subgraph Clients["👤 Client Layer"]
    direction TB
    RA(["👤 Rider App\nMobile / web"])
    DA(["🚗 Driver App\nMobile"])
  end

  subgraph Gateway["🔀 Gateway Layer"]
    AG["🔀 API Gateway\nAuth · rate limiting · TLS\nRoutes to Trip, Location, Streaming"]
  end

  subgraph AppTier["⚙️ Application Tier"]
    direction TB
    TS["⚙️ Trip Service\nCreate · Accept\nSurge lookup"]
    LS["⚙️ Location Service\nGPS ingestion\nGEOADD/ZREM · PUBLISH"]
    LSS["⚙️ Location Streaming\nWebSocket per trip\nSub to Redis Pub/Sub"]
  end

  subgraph AsyncTier["📨 Async Pipeline"]
    KF["📨 Kafka\nTripRequested · Accepted\nCompleted · at-least-once"]
    MW["⚙️ Match Worker\nKafka consumer\nGEOSEARCH · parallel dispatch"]
    SW["⚙️ Surge Worker\nBackground · every 30s\nWrites surge:{cell}"]
  end

  subgraph CacheTier["⚡ Cache Tier"]
    direction TB
    RG["⚡ Redis Geo Cluster\ndrivers:available\nPartitioned by region"]
    RP["⚡ Redis Pub/Sub\ntrip:{id}:location channels\nEphemeral delivery"]
    RS["⚡ Redis Surge Cache\nsurge:{geohash5}\n60s TTL · sub-ms reads"]
  end

  subgraph DBTier["🗄️ Database Tier"]
    direction TB
    TDB[("🟢 Trip DB (Postgres)\ntrip_id · rider_id · driver_id\nstatus · pickup · dropoff")]
    TDR[("🔵 Read Replica\nRead-only · async replication")]
  end

  RA -->|"POST /trips · GET /trips/{id}"| AG
  DA -->|"POST /location · PUT /accept"| AG
  AG -->|"Trip requests"| TS
  AG -->|"GPS updates"| LS
  AG -->|"WebSocket upgrade"| LSS
  TS -->|"INSERT trip"| TDB
  TS -->|"GET surge:{cell}"| RS
  TS -->|"TripRequestedEvent"| KF
  LS -->|"GEOADD / ZREM"| RG
  LS -->|"PUBLISH location"| RP
  RP -->|"location event"| LSS
  LSS -->|"WS push · lat/lng"| RA
  KF -->|"Consume event"| MW
  MW -->|"GEOSEARCH within radius"| RG
  MW -->|"Offer dispatch"| TS
  TS -->|"atomic UPDATE status"| TDB
  TDB -.->|"Async replication"| TDR
  SW -.->|"GEOSEARCH supply per cell"| RG
  SW -.->|"SETEX surge:{cell} every 30s"| RS

Interview Cheat Sheet

Start by separating the location write path (drivers broadcasting GPS) from the matching read path (riders triggering geospatial queries). They have nothing in common and scale differently.
State the dominant numbers early: 1M concurrent available drivers, 250K location writes per second, 500 trip requests per second peak. The ratio shapes the architecture.
Redis GEOSEARCH is the right tool for driver proximity queries: O(N+log M) per query, sub-millisecond for typical result sets, no SQL join or distance function required.
Partition Redis geo sets by geographic region (US-East, EU-West, APAC), not by driver_id hash. A radius query must land in one shard; hashing destroys geographic locality.
Parallel offer dispatch to top 5 candidates beats sequential offers: match latency equals the fastest willing driver, not the sum of timeouts.
Use UPDATE status = 'accepted' WHERE status = 'requested' as the atomic claim. No distributed lock needed. Zero rows updated means another driver won.
WebSocket with Redis Pub/Sub fan-out eliminates polling. One Redis channel per active trip; one subscription per rider WebSocket connection. Fan-out is one-to-one per trip.
Sticky routing by trip_id at the WebSocket load balancer means the node holding the rider's WebSocket is the same node subscribed to the Redis Pub/Sub channel. No cross-node messaging.
Surge pricing belongs in a pre-computed cache, not inline computation per request. A 30-second background worker computes per-geohash multipliers and writes them with 60-second TTL. Trip requests do one Redis GET.
Ghost driver cleanup: any driver with a location update older than 60 seconds should be evicted from the geo index by a background sweeper. Stale entries break matching by offering trips to unreachable drivers.
The Trip DB handles approximately 2,000 writes per second at peak (500 trip creations plus ~1,500 status updates). A single Postgres primary with a connection pool handles this comfortably; the database is not the bottleneck.
On driver app reconnect after a network drop, the next GPS POST re-inserts the driver into the geo index automatically. No special reconnect protocol is needed on the server side.

Comments

Comments