Marketplace
Walk through a complete marketplace design, from a basic listing service to a geospatial-aware search platform handling 100M DAU with sub-200ms search, location-based discovery, and real-time seller-buyer messaging.
What is an online marketplace?
An online marketplace, like Craigslist or Facebook Marketplace, connects sellers with buyers nearby. The interesting engineering challenge is combining geospatial search, full-text search, and real-time messaging into a single coherent system that stays fast when listings number in the hundreds of millions.
Functional Requirements
Core Requirements
- Sellers can create listings with title, description, price, photos, and location.
- Buyers can browse and search listings filtered by category, price range, and proximity to their location.
- Buyers can message sellers directly about a specific listing.
- Sellers can mark a listing as sold, hiding it from search results.
Below the Line (out of scope)
- Integrated payments and escrow
- Buyer and seller reviews and reputation scores
- Promoted or sponsored listing placement
- Dispute resolution and fraud detection
The hardest part in scope: Geospatial search combined with full-text filtering. A buyer types "vintage guitar" and wants results sorted by distance, not just text relevance. That combination of geo and text is where the design gets interesting, and where naive SQL queries fall apart at scale.
Integrated payments are below the line because they require a licensed payment processor, escrow logic, and regulatory compliance. To add them, I would integrate Stripe Connect for marketplace payments and build a separate escrow service that holds funds until the buyer confirms receipt.
Reputation scores are below the line because they require a review submission pipeline and fraud detection to prevent fake reviews. To add them, I would add a Review entity after a transaction closes and roll up scores asynchronously into seller profiles.
Promoted listings are below the line because they introduce an ad auction mechanism. To add them, I would build a thin ad-serving layer that injects sponsored results at fixed positions in the search response.
Non-Functional Requirements
Core Requirements
- Availability: 99.9% uptime. Availability over consistency for search (a slightly stale search result is acceptable; a failed search is not).
- Search latency: Search results return in under 200ms p99, including geo-filter and text-match scoring.
- Scale: 100M DAU, 500M total active listings. Peak write rate: ~2,000 new listings per second. Peak search rate: ~50,000 searches per second.
- Message delivery: Messages between buyer and seller delivered within 500ms.
- Durability: Listings and messages are never lost. Photos stored durably in object storage.
Below the Line
- Sub-50ms search via CDN-edge caching of popular query results
- Real-time sold status propagation across all active sessions
Read/write ratio: For every listing created, expect roughly 25 searches that scan that listing. This 25:1 read skew shapes the entire storage and caching strategy. The search path must be fast and horizontally scalable. The write path handles a tiny fraction of the traffic.
Under 200ms search latency means a naive SELECT * FROM listings WHERE ST_DWithin(location, ?, ?) against a 500M-row PostgreSQL table is not viable without spatial indexing. Even with a PostGIS GiST index, filtering 500M rows by geo-box and then by text is slow without a dedicated search engine. The 100M DAU target means the search tier must scale horizontally with no single bottleneck.
Core Entities
- Listing: The core object. Carries title, description, price, category, status (active/sold), and a geographic coordinate (latitude + longitude). Belongs to exactly one seller.
- Photo: A binary asset attached to a listing. Stored in object storage (S3); the listing record stores only the photo URLs.
- User: The account that creates listings or sends messages. Carries an ID, display name, and an optional saved location for proximity defaults.
- Message: A single message in a conversation between a buyer and a seller about a specific listing. A conversation is implicitly defined by the
(listing_id, buyer_id, seller_id)triple.
Full schema, indexes, and column types are deferred to the data model deep dive. These four entities are sufficient to drive the API design and High-Level Design.
API Design
Start with one endpoint per functional requirement, then evolve where the naive shape needs adjustment.
FR 1: Create a listing
POST /listings
Authorization: Bearer <token>
Body: {
title: string,
description: string,
price_cents: number,
category: string,
location: { lat: number, lng: number },
photo_ids: string[] // pre-uploaded to S3, see note below
}
Response: 201 { listing_id, status: "active" }
Photos are not included in this request body. Embedding binary files in JSON is inefficient and creates timeouts on large images. Instead, clients upload photos directly to S3 via pre-signed URLs (a separate POST /photos/upload-url endpoint returns a short-lived signed URL). Once uploaded, the client passes the resulting photo IDs to this endpoint.
FR 2: Search listings
GET /listings/search
Query: {
q?: string, // full-text query (e.g. "vintage guitar")
lat: number,
lng: number,
radius_km: number, // defaults to 25km
category?: string,
min_price?: number,
max_price?: number,
cursor?: string, // for cursor-based pagination
limit?: number // default 20
}
Response: 200 {
listings: [Listing],
next_cursor: string | null
}
Cursor-based pagination over offset pagination because search results shift as new listings are posted. Offset pagination would show duplicates or skip items; a cursor anchors the result window to a stable position.
FR 3: Send a message to a seller
POST /listings/{listing_id}/messages
Authorization: Bearer <token>
Body: { text: string }
Response: 201 { message_id, conversation_id, sent_at }
The server derives seller_id from the listing, and buyer_id from the auth token. No need to pass either in the body.
FR 4: Get conversation messages
GET /listings/{listing_id}/messages
Authorization: Bearer <token>
Query: { cursor?: string, limit?: number }
Response: 200 {
messages: [Message],
next_cursor: string | null
}
FR 5: Mark a listing as sold
PATCH /listings/{listing_id}
Authorization: Bearer <token>
Body: { status: "sold" }
Response: 200 { listing_id, status: "sold" }
PATCH rather than a dedicated /listings/{id}/sold endpoint because status is a field on the listing. PATCH is idiomatic for partial updates. The server must validate that only the listing owner can change status.
High-Level Design
1. Sellers can create a listing with photos and location
The write path: seller uploads photos to S3, then submits listing metadata to the Listing Service, which writes to the database.
Components:
- Client: Web or mobile app. Fetches a pre-signed S3 URL, uploads photos directly to S3, then POSTs listing metadata to the API.
- API Gateway: Routes requests, handles auth token validation, and enforces rate limits to prevent listing spam.
- Listing Service: Validates the listing fields, persists the record to PostgreSQL, and publishes a
listing.createdevent to a message queue for async downstream processing (search indexing). - PostgreSQL: Stores listing records with geographic coordinates as a PostGIS
GEOGRAPHYcolumn. This is the source of truth. - S3: Stores raw photo bytes. The Listing Service stores only the photo URLs in PostgreSQL.
Request walkthrough:
- Client calls
POST /photos/upload-urland receives a pre-signed S3 URL (valid for 10 minutes). - Client uploads the photo directly to S3 using the pre-signed URL. S3 returns the photo URL.
- Client calls
POST /listingswith metadata including the photo URLs. - Listing Service validates all fields and writes the listing row to PostgreSQL.
- Listing Service publishes
listing.createdevent to Kafka for downstream processing. - Listing Service returns
201 { listing_id, status: "active" }to the client.
This is the write path only. Photo URLs live in S3; the database stores only references. The Kafka event seeds the search index asynchronously (next section).
2. Buyers can browse and search by category, price, and location
This is where the interesting complexity lives. Buyers need two query patterns: structured browse (category + price filter, no text) and free-form search (text query + geo filter). These are different enough that a single naive SQL approach breaks for both.
The naive approach: SELECT * FROM listings WHERE category = ? AND price BETWEEN ? AND ? AND ST_DWithin(location, ST_MakePoint(lng, lat), radius).
This breaks at 500M rows even with a PostGIS index. PostGIS can answer the geo-filter efficiently, but combining it with a full-text LIKE search saturates the database when you have 50,000 queries per second hitting the same node.
The fix: Route search queries through a dedicated Elasticsearch cluster. Elasticsearch handles both geo_distance filtering and BM25 full-text scoring natively in one query, scales horizontally by adding shards, and keeps the PostgreSQL primary reserved for writes.
The Kafka consumer (Search Indexer) from the previous section consumes listing.created / listing.updated events and upserts documents into the Elasticsearch index. Eventual consistency here is acceptable: a listing appearing in search 1-2 seconds after creation is not a user-visible problem.
Components added:
- Search Service: Translates the
/listings/searchquery parameters into an Elasticsearch query with geo_distance filter, bool must (text), and range filters (price). - Elasticsearch Cluster: Stores a denormalized listing document per listing. Handles geo_distance, full-text BM25 scoring, and filter aggregations.
- Search Indexer (Kafka consumer): Consumes
listing.*events from Kafka and upserts documents into Elasticsearch. Runs async, not in the write path.
Request walkthrough:
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.