Proximity Service
Design a location-based search system that answers 'what's near me?' in milliseconds for 100M+ queries per day, covering geohashing, spatial indexes, and the key differences between static and dynamic proximity use cases.
What is a proximity service?
A proximity service answers "what's near me?": given a latitude/longitude point and a search radius, return a ranked list of businesses or users within that area. The interesting engineering challenge is not the concept but the index. Scanning 500M lat/lng rows for every query is a full table scan, and at 58K requests/second that means instant saturation. This question tests spatial indexing (geohashing, quadtrees), caching strategies for a 100:1 read-heavy workload, and the architectural fork between static data (Yelp businesses) and dynamic data (Uber drivers, nearby friends).
I love this question in interviews because it starts deceptively simple (just filter by distance!) and then forces you to invent a spatial index from first principles. The moment you realize a B-tree cannot index two dimensions simultaneously, the real design begins.
Functional Requirements
Core Requirements
- Given a user's location (lat/lng) and a search radius, return a list of nearby businesses sorted by distance.
- Business owners can add, update, and delete listings.
- Users can filter results by category (restaurants, gyms) and minimum rating.
Below the Line (out of scope)
- Real-time friend or driver location updates
- Reviews, photos, and ratings management
- Turn-by-turn navigation
The hardest part in scope: Indexing billions of lat/lng coordinates so that "find everything within 5km of this point" completes in under 100ms, not a full table scan.
Real-time location updates are below the line because they introduce a separate architecture: persistent WebSocket connections, Redis GEO writes, and a completely different data freshness model. To add them, I would build a dedicated location update service that accepts position pushes over WebSockets and writes to Redis GEOADD. The search path reads from Redis GEORADIUS rather than the business database.
Reviews and ratings live in a separate content service. They share a business_id with the business record but do not affect spatial indexing. To add them, attach a reviews table keyed on business_id and serve aggregated ratings via a separate content API.
Turn-by-turn navigation is a solved problem (Google Maps, Mapbox). Whether in scope or out, reference a third-party routing API rather than building from scratch.
Non-Functional Requirements
Core Requirements
- Scale: 500M businesses, 100M DAU, 5B location queries per day.
- Latency: Under 100ms p99 for nearby search queries.
- Availability: 99.99% uptime. Availability favored over consistency (slightly stale business results are better than a 500 error).
- Accuracy: Results within 100 meters of true position.
Below the Line
- Sub-10ms latency via edge caching for the coldest paths
- Real-time analytics on search patterns and query distribution
Read/write ratio: This is overwhelmingly read-heavy, 100:1 or more. Businesses add and update listings occasionally. Search queries happen constantly. This ratio drives aggressive caching and read replicas. Nearly every scaling decision traces back to it.
5B queries per day works out to approximately 58,000 requests per second at peak. A single database with a naive lat/lng scan cannot approach that throughput. The 99.99% availability target eliminates single-node architectures for both the read and write paths.
Core Entities
- Business: A listing with a name, category, rating, and a fixed geographic position. The primary read target for all search queries.
- Location: The geographic coordinate of a business, stored as
lat,lng, and a precomputedgeohashstring for fast spatial queries. Updated whenever a business is created or its position changes. - SearchResult: The response shape for a nearby search: a ranked list of businesses with name, category, rating, and computed distance. Ephemeral, never stored.
- UserLocation: For dynamic proximity use cases (nearby friends, drivers), a user's current lat/lng with a
last_updatedtimestamp. Lives in a separate in-memory store, not the business database.
The full schema, indexes, and column types belong in a data model deep dive. What matters here: Business is static and persisted in a relational database; UserLocation is dynamic and requires an in-memory store with fast writes and short TTLs.
API Design
FR 1 - search nearby businesses:
GET /search/nearby
Params: lat, lng, radius_km, category?, min_rating?, limit=20, cursor?
Response: { businesses: [...], next_cursor: "..." }
GET with query params because these requests are read-only and inherently cacheable. A CDN or API gateway can cache GET /search/nearby?lat=37.7&lng=-122.4&radius_km=1&category=restaurants for 60 seconds with no application logic changes. Cursor-based pagination handles large result sets correctly when new businesses are inserted between pages.
Radius in kilometers rather than degrees because degrees are not constant distances (1 degree of longitude is 111km at the equator, 0km at the poles). Kilometers give a consistent, human-readable API contract.
FR 2 - CRUD for business listings:
POST /businesses
PUT /businesses/{id}
DELETE /businesses/{id}
These are write operations executed by business owners, not searchers. They hit the write path and trigger invalidation of the relevant geohash cell cache on update or delete. The geohash field is computed server-side; clients submit raw lat/lng only.
High-Level Design
1. Single server with naive lat/lng query
The simplest possible system: one application server, one PostgreSQL database, lat/lng float columns on the businesses table.
Components:
- Client: Web or mobile app sending
GET /search/nearbywith lat, lng, and radius. - App Server: Converts the radius in km to degree deltas and runs a 2D bounding-box query.
- PostgreSQL: Stores all businesses with
lat FLOATandlng FLOATcolumns. No spatial index.
Request walkthrough:
- Client sends
GET /search/nearby?lat=37.7749&lng=-122.4194&radius_km=5. - App server converts 5km to degree deltas (roughly 0.045 degrees latitude).
- App server runs
SELECT * FROM businesses WHERE lat BETWEEN (user_lat - delta) AND (user_lat + delta) AND lng BETWEEN (user_lng - delta) AND (user_lng + delta). - App server sorts results by Haversine distance and returns the top 20.
What breaks: WHERE lat BETWEEN ... AND lng BETWEEN ... scans every row. A B-tree index constrains one dimension efficiently but must scan every row in the matching lat band to apply the lng filter. At 500M businesses and 58K requests/second, every query is a full table scan. The database saturates immediately.
I'd draw this naive version first on the whiteboard and explicitly say: "this works at 10,000 businesses, here's why it breaks at 500 million." Interviewers want to see you reason about when a design stops working, not just jump to the optimal answer.
2. Geohash index
The key insight: encode each business's location as a string where shared prefix means geographic proximity. A database WHERE geohash LIKE 'dr5ru%' is an efficient B-tree prefix scan, not a full table scan.
Geohashing divides the Earth into rectangular cells using alternating longitude/latitude bit splits. Each additional character narrows the cell by roughly 8x. A 6-character geohash covers approximately 1.2km x 0.6km. Businesses in the same cell share the same 6-character prefix.
Components (added):
- Geohash column: A
VARCHAR(12)geohash column on the businesses table with a B-tree index, precomputed on insert and updated on position change.
Request walkthrough:
- Client sends
GET /search/nearby?lat=37.7749&lng=-122.4194&radius_km=5. - App server encodes the user's position to a 6-character geohash:
dr5ru. - App server computes the 8 neighboring cell hashes:
[dr5rr, dr5rx, dr5ry, dr5rm, dr5rj, dr5rn, dr5rp, dr5rt]. - App server runs
SELECT * FROM businesses WHERE geohash IN ('dr5ru', 'dr5rr', 'dr5rx', ...)(9 cells total). - App server filters by exact Haversine distance and returns sorted results.
Never query only the target cell. A business 50 meters away in an adjacent cell has a completely different geohash prefix and will be silently excluded. Always query the target cell plus its 8 neighbors.
What break remains: geohash cells are rectangles, not circles. The 9-cell query returns some businesses outside the true circular radius. The Haversine post-filter in step 5 removes these false positives. This is cheap (runs in memory) and mandatory for the accuracy NFR.
I always call out that the Haversine filter is not optional. I've seen candidates skip it and then get tripped up when the interviewer asks: "what about a business at the corner of the 9-cell grid that's actually 8km away?" The post-filter is your safety net.
3. Redis geospatial cache layer
5B queries/day at 58K requests/second requires caching. The geohash-to-businesses mapping for popular areas changes rarely (restaurants do not move). Redis provides GEOADD and GEORADIUS commands with throughput exceeding 100K operations/second.
Components (added):
- Load Balancer: Distributes requests across multiple app server instances.
- Redis GEO: In-memory geospatial index. Stores businesses per geohash region. Answers GEORADIUS queries in under 1ms.
- Read Replica: PostgreSQL replica absorbs all read queries on cache miss. Primary handles writes only.
Request walkthrough (read path):
- Client sends
GET /search/nearby. - Load balancer routes to an available app server.
- App server computes target geohash and 8 neighbors.
- App server queries Redis GEORADIUS for businesses in the 9-cell region.
- Cache hit: returns results directly. Cache miss: queries the read replica, populates Redis with 60s TTL.
Request walkthrough (write path):
- Business owner sends
POST /businessesorPUT /businesses/{id}. - App server writes to the primary PostgreSQL database (source of truth).
- App server invalidates the Redis cache entry for the affected geohash cell.
This architecture handles the 100:1 read/write ratio. Reads are served from Redis in under 1ms for cache hits, covering the majority of queries in dense urban areas (thousands of people hitting the same geohash cells). The read replica absorbs cache misses while the primary handles infrequent writes.
I'd pause here in an interview and say: "this handles the static business case. But if the interviewer extends the problem to nearby friends or drivers, the entire data layer changes." Planting that seed early shows you understand the static-vs-dynamic fork before you're asked about it.
Potential Deep Dives
1. How do we index geographic data?
The core challenge: "find everything within a circle" does not map onto a standard B-tree index. B-trees index one dimension at a time. We need an index that naturally captures spatial proximity.
2. How do we handle geohash boundary misses?
The boundary problem: two businesses can be 10 meters apart but in adjacent geohash cells with completely different 6-character prefixes. A naive single-cell query silently misses the neighbor.
3. How do we handle real-time location updates (nearby friends / drivers)?
Static businesses do not move. Nearby-friends and ride-sharing features require fresh position data every few seconds. This is a fundamentally different workload: high write throughput, short data TTL, and sub-second freshness requirements the business architecture cannot satisfy.
4. How do we scale to 5B queries per day?
58K requests/second with a 100ms p99 SLA is a serious throughput target. The HLD baseline handles the basics. This deep dive addresses the bottlenecks that emerge at true production scale.
Final Architecture
Geographic sharding and geohash cell caching (CDN + Redis) absorb 90%+ of read load because location queries are spatially clustered. Users in Manhattan search within Manhattan. A CDN node in New York City serves those responses from cache without a single request reaching the database. PostgreSQL shards only see cache misses and write operations.
Interview Cheat Sheet
- Geohash encodes a lat/lng pair into a string prefix: nearby locations share the same prefix. A 6-character geohash covers roughly 1.2km x 0.6km. Longer strings produce smaller cells and higher spatial precision.
- Always query 9 cells, never 1: query the target geohash cell plus its 8 neighbors (3x3 grid). This guarantees no boundary misses within one cell width of any edge. Geohash libraries return all 8 neighbors in O(1).
- Haversine post-filter is mandatory after the 9-cell query: the cells form a rectangle, not a circle. Filter results in memory by true distance to remove false positives. This step is cheap and cannot be skipped.
- Geohash vs. quadtree tradeoff: geohash uses fixed-size rectangular cells (easy to implement, works well for uniformly distributed data). Quadtrees adapt to data density (better for uneven distributions like urban vs. rural business density, but harder to persist and update). Lead with geohashing in an interview; mention quadtrees as the production alternative that handles density spikes.
- Static vs. dynamic proximity are different architectures: businesses (static, rarely change position) use PostgreSQL with a geohash B-tree index and a Redis cell cache. Drivers and friends (dynamic, position changes every 5 seconds) use Redis GEO commands (GEOADD + GEORADIUS) with per-user TTL. Never try to store live driver positions in PostgreSQL.
- Redis GEO for real-time positions: GEOADD is O(log N) per write. GEORADIUS returns sorted nearby members in sub-millisecond time. Throughput exceeds 100K operations/second per Redis instance. Entries expire automatically via TTL (30s for drivers) with no cleanup job required.
- Geographic sharding by geohash prefix: shard PostgreSQL by the first two characters of the geohash, giving approximately 32 natural geographic regions. Proximity searches almost never cross shard boundaries because users search near themselves.
- CDN caches geohash cell responses at 60s TTL: popular searches repeat thousands of times per minute. Quantize lat/lng to 3 decimal places (~100m resolution) before building CDN cache keys to maximize hit rate. Business updates invalidate only the affected cell's cache, not the global cache.
- 5B queries/day = 58K req/sec peak: Redis GEO handles 100K+ ops/sec; a warm CDN absorbs 90%+; PostgreSQL shards only see cache misses.
- The read/write ratio is 100:1 or higher: aggressive caching (Redis, CDN) is not optional, it is the design. Reads dominate by two orders of magnitude and every component choice reflects that.
- Precision selection for the 9-cell search: use a lookup table (1km radius = precision 6; 5km = precision 5; 50km = precision 4). Do not compute precision dynamically from the radius during an interview; state the table and move on.
- WebSocket beats REST polling for real-time location: REST polling at 1s intervals for 100M users equals 100M req/sec. WebSocket push + Redis GEOADD at 5s intervals equals 20M writes/sec served from in-memory Redis, not the database.
- Boundary misses are a correctness problem, not a performance problem: missing a business 10m away in an adjacent cell violates the 100m accuracy NFR. The 9-cell pattern fixes correctness; the Haversine post-filter removes false positives inside the returned cells.