Hotel Booking
Design a hotel reservation system like Booking.com or Airbnb: from a simple availability calendar to a system that handles concurrent bookings, double-booking prevention, and room-level inventory management at scale.
What is a hotel booking system?
A hotel booking system lets guests search available rooms by location and date range, hold a room during checkout, and confirm a paid reservation. The system looks straightforward until you consider that millions of users search simultaneously and the availability question ("is room 214 free from April 1st to April 5th?") requires querying across a date range without scanning 500 million reservation rows. This question tests date-range indexing, concurrent inventory management, and the read/write split between an eventually consistent search path and a strongly consistent booking path.
I'd tell any candidate to focus on two things in this interview: the availability data model and the concurrency control. Everything else is standard CRUD. Those two problems are where the interesting engineering lives.
Functional Requirements
Core Requirements
- Users can search for available rooms by location, check-in date, check-out date, and guest count.
- Users can reserve a room and complete payment.
- Hotel managers can configure room types, prices, and availability.
- The system prevents double-booking of the same room on the same dates.
Below the Line (out of scope)
- Dynamic pricing and revenue management
- Reviews and ratings
Dynamic pricing requires a pricing engine with write access to RoomType.base_price that responds to real-time demand signals: occupancy rate, events in the area, competitor pricing. To add it, I would build a background pricing service that subscribes to booking events via a Kafka topic and updates prices outside the synchronous booking path. It does not touch the concurrency primitives that make this system interesting, so it stays out of scope.
Reviews and ratings is a read-heavy, eventually consistent feature with no write conflict with the booking path. The integration point is a review_submitted event published after a completed stay. A separate reviews service stores ratings and aggregates scores asynchronously without touching the inventory or reservation models.
The hardest part in scope: Checking room availability for a date range without scanning every stored reservation, and ensuring two users cannot book the same physical room for overlapping dates. These two problems drive nearly every design decision in this article.
Non-Functional Requirements
Core Requirements
- Scale: 50M DAU; ~500M total reservations stored; ~10,000 availability searches per second at peak.
- Booking throughput: ~1,000 bookings per second at peak.
- Latency: Search results under 200ms p99; booking confirmation under 500ms p99.
- Consistency: Strong consistency for room inventory. A room must never be double-booked.
- Database choice: PostgreSQL for both Hotels DB and Bookings DB. The booking write requires an ACID transaction that atomically locks inventory rows, increments a counter, and inserts a reservation. NoSQL databases (DynamoDB, Cassandra) lack multi-row transactions with row-level locking, which would force a distributed lock service with added latency and failure modes. Elasticsearch handles the search path where horizontal scale and schema flexibility actually apply.
- Availability: 99.99% uptime for the booking path (roughly 52 minutes downtime per year); search can tolerate brief eventual consistency.
- Durability: Confirmed reservations must never be silently lost across server failures.
Below the Line
- Multi-region active-active replication (a single-region design with read replicas is sufficient)
- Real-time fraud scoring during checkout (a separate async pipeline beside the booking path)
Read/write ratio: Availability searches outnumber bookings by roughly 100:1. Ten thousand searches per second against one thousand bookings per second is the defining number for this system. Almost every architectural decision traces back to it: the search path needs aggressive caching and a read-optimized index; the booking path needs ACID transactions and row-level locking. These are opposite requirements that must be served by different storage systems. Combining them in one service or one database is the design mistake this article is built around avoiding.
Core Entities
- Hotel: A property with name, geo-coordinates, address, and star rating. One hotel has many room types.
- RoomType: A category within a hotel (Deluxe Queen, King Suite). Holds base price, max occupancy, and amenity list. One room type has many physical rooms.
- Room: A specific bookable unit linked to a room type. Room 214, Room 412. The unit that appears in a confirmed booking.
- Reservation: A temporary hold created when checkout begins. Expires automatically if payment does not complete within a configurable window (15 minutes by default).
- Booking: A confirmed, paid reservation. The permanent record of a completed stay. Links a specific Room, a User, the stay dates, total price, and a payment reference.
- User: A guest account with contact info, payment methods, and booking history.
Full schema details, including the date-range index strategy and the room_type_inventory table design, come up in the deep dives. These six entities are enough to drive the API and High-Level Design sections without getting lost in column types.
API Design
FR 1: Search for available rooms:
# Returns available hotels matching the search criteria; served from the search index
GET /v1/hotels/search
Query: location, check_in_date, check_out_date, guests, page_cursor?
Response: {
results: [
{ hotel_id, name, rating, address, available_room_types: [...], lowest_price },
...
],
next_cursor: "eyJpZCI6Mj..."
}
GET over POST because this is a read with filter parameters. Cursor-based pagination handles open-ended result sets without OFFSET performance degradation at depth. The available_room_types field is pre-computed from the availability index, so this endpoint never scans the reservations table on the hot path.
FR 1b: View room type availability detail:
# Detailed availability for a specific room type and date range
GET /v1/hotels/{hotel_id}/room-types/{room_type_id}/availability
Query: check_in_date, check_out_date
Response: { room_type_id, available_count, price_per_night, total_price }
This endpoint drives the "how many rooms of this type are still open for my dates?" panel on the hotel detail page. The available_count is computed from a dedicated inventory table (see Deep Dive 1) rather than by joining against the full reservations history.
FR 2: Create a reservation (start checkout):
# Atomically holds one room of the requested type; returns reservation with expiry
POST /v1/reservations
Body: { room_type_id, check_in_date, check_out_date, user_id, idempotency_key }
Response: {
reservation_id: "res_abc123",
room_id: "room_214",
expires_at: "2026-03-29T12:15:00Z",
total_price: 450
}
idempotency_key is client-generated (a UUID v4) and required on every call. If a network timeout causes the client to retry, the server returns the original reservation instead of creating a duplicate. The server assigns a specific room_id from the available pool of the requested room_type_id. Two concurrent requests for the same room_type_id on the same dates must each receive a different room_id or one must receive 409 Conflict.
FR 2b: Confirm booking (complete checkout):
# Processes payment and converts the reservation to a confirmed booking
POST /v1/reservations/{reservation_id}/confirm
Body: { payment_method_id: "pm_abc" }
Response: { booking_id: "bk_def", room_id, check_in_date, check_out_date, total_amount }
The server re-validates that the reservation has not expired before charging. If the reservation is expired, the endpoint returns 410 Gone and the client must restart the checkout flow from the search step.
FR 2c: Cancel a reservation (explicit checkout abandon):
# Explicit release; expired reservations are also released by the background Expiry Worker
DELETE /v1/reservations/{reservation_id}
Response: 204 No Content
The client sends this on explicit cancel. The Expiry Worker handles silently abandoned reservations automatically. Both paths converge on the same state change: room status returns to available.
FR 3: Hotel manager: configure room types and inventory:
# Create a new room type; request triggers creation of N individual Room rows
POST /v1/hotels/{hotel_id}/room-types
Body: { name, max_occupancy, base_price, amenities, room_count }
Response: { room_type_id, rooms_created: 12 }
# Update pricing or mark a room type as inactive for maintenance
PATCH /v1/hotels/{hotel_id}/room-types/{room_type_id}
Body: { base_price?, is_active? }
Response: { room_type_id, updated_fields: [...] }
PATCH over PUT because managers rarely update all fields at once. room_count in the POST body causes the server to generate that many Room rows in the same transaction, keeping the room type and its physical rooms atomically consistent.
High-Level Design
1. Hotel search
The search path: a user submits location and date filters; the system returns a paginated list of hotels with available room types and lowest price.
The naive approach is a SQL query against the reservations table: find all hotels near the location, for each hotel find which rooms are booked in the date range, subtract from total, return what remains. At 50M DAU and 10,000 searches per second this is a full table scan on a 500M-row table for every request. It never works.
I always start by showing this naive approach on the whiteboard and letting the interviewer see why it fails. Jumping straight to Elasticsearch makes you look like you memorized the answer.
I always split search from booking before drawing any other boxes on the whiteboard, because keeping them in one service forces you to optimize for opposite workloads simultaneously. Search is read-heavy, geospatial, and tolerant of a few seconds of staleness. Booking is write-bound, transactional, and must be immediately consistent.
Components:
- Client: Web or mobile app sending search queries via the API Gateway.
- API Gateway: Routes
/v1/hotels/searchtraffic to the Search Service; handles auth token validation and rate limiting. - Search Service: Stateless service that applies geo and availability filters against Elasticsearch and returns paginated results.
- Elasticsearch (Search Index): Hotel documents with geo-coordinates indexed for bounding-box queries and pre-aggregated availability counts per date range. Updated asynchronously when bookings are created or cancelled.
- Hotels DB (PostgreSQL): Source of truth for hotel metadata, room types, and the inventory counter table. Elasticsearch is populated from here via a change event pipeline.
Request walkthrough:
- Client sends
GET /v1/hotels/search?location=NYC&check_in_date=2026-04-01&check_out_date=2026-04-03&guests=2. - API Gateway authenticates the request and routes to the Search Service.
- Search Service queries Elasticsearch: geo-filter by bounding box around the requested location, filter
available_count[2026-04-01] > 0ANDavailable_count[2026-04-02] > 0for all nights in the range, sort by rating or price. - Elasticsearch returns matching hotel documents with pre-aggregated availability per room type.
- Search Service returns paginated results with a cursor for the next page. No PostgreSQL reads on this path.
The search path never touches the reservations table directly. Availability counts in Elasticsearch are maintained by a background pipeline that processes booking events. The booking path comes next.
2. Room reservation and payment
The booking path: a user selects a room type; the system assigns a specific room, creates a reservation with an expiry, takes payment, and confirms the booking.
Adding a dedicated Booking Service keeps booking logic isolated from search. Both services scale independently: search is read-heavy (100:1) while booking is write-bound with ACID requirements. Merging them means a surge in search traffic degrades booking write latency, which is the opposite of what you want.
Components (added):
- Booking Service: Orchestrates reservation creation, room assignment (with concurrency control), and payment confirmation. The concurrency mechanism is treated as a black box here and covered in detail in Deep Dive 2.
- Payment Service: Wraps the external payment processor (Stripe). Returns a
payment_intent_idfor durability. - Bookings DB (PostgreSQL): Stores
reservationsandbookingstables separately from hotel metadata. Handles the ACID transaction for the booking write and row-level locks. - Expiry Worker: Background job that runs every 30 seconds and releases expired reservation holds back to available inventory.
- Redis: Stores idempotency key responses and room hold pre-filter keys. Non-authoritative; the database is always the source of truth.
Request walkthrough (create reservation):
- User selects a room type and clicks "Reserve". Client sends
POST /v1/reservationswith an idempotency key. - Booking Service checks the idempotency key in Redis. If found, returns the cached response immediately without a DB round-trip.
- Booking Service selects one available
room_idof the requested type for the date range, applying a row-level lock to prevent concurrent assignment. - Booking Service inserts a
Reservationrow withexpires_at = NOW() + 15 minutesand decrements the available inventory counter. - Returns
reservation_id,room_id,expires_at, and total price to the client.
Request walkthrough (confirm booking):
- User completes the payment form and sends
POST /v1/reservations/{id}/confirmwith a payment method. - Booking Service re-validates the reservation has not expired.
- Payment Service calls Stripe and receives a
payment_intent_id. - Booking Service runs an atomic transaction: INSERT the
Bookingrow, DELETE theReservationrow, COMMIT. If the DB write fails after the charge succeeds, a reconciliation job detects the orphaned intent and retries. - Booking Service publishes a
booking_confirmedevent to update the Elasticsearch availability index. - Returns confirmed booking details to the client.
The transaction that converts a reservation into a booking also decrements the availability counter used by the Elasticsearch index. The exact concurrency mechanism preventing two users from receiving the same room during concurrent requests is deferred to Deep Dive 2. The booking_confirmed event that updates the Elasticsearch availability index requires Kafka, which is introduced in Section 3.
3. Hotel manager configuration
The admin path: a hotel manager creates room types, updates prices, and closes rooms for maintenance. These writes must propagate to the search index.
I route admin operations through a dedicated Admin Service rather than the Booking Service. Managers represent a tiny fraction of traffic (thousands of requests per day, not per second), but their writes have the broadest side effects: a price change must update Elasticsearch, a room closure must update availability counts, and a new room type must atomically generate individual Room rows.
Components (added):
- Admin Service: Handles hotel management operations. Validates manager role via the API Gateway before any write.
- Kafka: Admin writes publish events (
price_updated,room_type_created,room_closed). Downstream consumers react asynchronously. - Search Index Updater: Kafka consumer that applies hotel metadata and availability changes to Elasticsearch. Runs independently of the booking path.
Request walkthrough:
- Hotel manager opens the management dashboard and sends
POST /v1/hotels/{id}/room-typeswith room details and count. - API Gateway validates the manager's auth token and confirms the
hotel_managerrole. - Admin Service writes the new
RoomType, allRoomrows, and initialroom_type_inventoryrows to Hotels DB in one atomic transaction. - Admin Service publishes a
room_type_createdevent to Kafka with the new inventory counts. - Search Index Updater consumes the event and updates the Elasticsearch document for this hotel.
- Future search queries see the new room type within a few seconds.
Admin writes have a few seconds of propagation lag before the search index reflects the change. A new room type appearing in search results 3 seconds after creation is not a business problem.
4. Double-booking prevention (overview)
FR4 is not a separate service; it is a correctness constraint on the booking write path.
The naive race condition: two users search simultaneously and both see the last available Deluxe Queen room for April 1st to April 3rd. Both click "Reserve" within milliseconds. Both requests arrive at the Booking Service at nearly the same instant. Both queries read available_count = 1, both pass the availability check, and both insert a reservation for the same room. One physical room is now holding two reservations.
The full solution has three layers: an efficient availability data model that makes the "is this room free?" query fast (Deep Dive 1), pessimistic row locking that makes the inventory decrement atomic (Deep Dive 2), and a checkout hold window with automatic expiry that prevents permanently blocking inventory when users abandon checkout (Deep Dive 3).
Components:
- Booking Service: Acquires row-level locks on
room_type_inventorybefore allowing any reservation write. - room_type_inventory: The locked resource. Two concurrent transactions compete for the same (room_type_id, stay_date) rows.
- Redis: Pre-filter for idempotency keys. Prevents duplicate requests from reaching the database lock path entirely.
Request walkthrough (concurrent double-booking attempt):
- Users A and B simultaneously search and both see the last Deluxe Queen room available for April 1st to April 3rd.
- Both send
POST /v1/reservationswithin milliseconds of each other. - Booking Service checks idempotency keys in Redis. Both keys are new.
- User A's transaction acquires
SELECT FOR UPDATEon the inventory rows for April 1st and April 2nd first. - User B's transaction attempts the same lock. PostgreSQL blocks User B until User A commits or rolls back.
- User A: availability check passes (MIN available = 1). Assigns room 214. Commits.
- User B: lock releases. Availability check runs again. MIN available = 0. Returns 409 Conflict.
User A acquires the row lock first. User B blocks at the database level until A commits, then re-evaluates availability and receives 409 Conflict. Unlike application-level checks, this is enforced by the database regardless of how many app server instances are running.
Database-level locking (SELECT FOR UPDATE) is the correctness guarantee. Redis is an optimization that reduces the number of requests reaching the database, but it is not the source of truth. Never skip the database lock and rely on Redis alone.
Potential Deep Dives
1. How do you check room availability efficiently for a date range?
The core question is: given a hotel, a room type, and a date range of N nights, how many rooms of that type are still available? At 10,000 searches per second and 500M stored booking rows, this query cannot scan the bookings table on every request. There are three generations of solutions, and each one addresses the failure mode of the previous.
I'd walk through all three options in the interview even if time is tight. The progression from naive scan to boolean table to counter table shows you understand why the final answer works, not just what it is.
2. How do you prevent two users from booking the same room simultaneously?
Room-level inventory is a shared mutable resource. At 1,000 bookings per second across multiple stateless app server instances, two requests that each read "1 room available" can both proceed and produce a double-booking. This is the same race condition as ticket booking, but with an added dimension: hotel rooms are instances within a type (not unique seats), so the problem has two layers: inventory contention at the room-type level, and specific room assignment at the physical room level.
I'd call this out as the signature moment of the hotel booking interview. If you get concurrency control right, the interviewer knows you can build real systems. If you skip it, nothing else matters.
I always resolve the concurrency model before any other part of the design in this interview, because getting it wrong here makes every other choice irrelevant.
3. How do you hold a room during checkout without permanently blocking inventory?
A reservation window creates a holding problem: the room is assigned but not yet paid for. Other users who search during this window should not see the room as available. An abandoned checkout must release the room automatically so it re-appears in search results. This is the checkout hold timer pattern, and it is the third layer of the double-booking prevention stack.
I'd frame this problem as "what happens when the user closes their laptop mid-checkout?" That question immediately makes the interviewer see why a hold timer with automatic expiry is necessary, not just nice-to-have.
Final Architecture
The complete system after all three deep dives combines the optimized search path (Elasticsearch with pre-aggregated inventory), the ACID booking path with pessimistic locking and idempotency keys, the Redis pre-filter for checkout holds, the Expiry Worker for automatic release, and the Kafka pipeline that keeps the search index consistent with booking state.
Two separate PostgreSQL databases keep the write domains isolated. Hotels DB is updated infrequently by manager operations. Bookings DB absorbs the full concurrency of the booking path with row-level locks. Elasticsearch handles all search traffic independently, never competing for database connections with transactional writes.
The most important architectural insight: Redis sits in front of PostgreSQL on the write path as a pre-filter, not as a replacement. Every request that passes the Redis pre-filter still completes a SELECT FOR UPDATE PostgreSQL transaction. Correctness lives in the database; Redis is a performance optimization that can be bypassed without breaking the system.
Interview Cheat Sheet
- The six core entities are Hotel, RoomType, Room, Reservation, Booking, and User. RoomType is the category (Deluxe Queen); Room is the physical unit (Room 214). Reservation is temporary with
expires_at; Booking is permanent with apayment_intent_id. Keeping Reservation and Booking as separate rows makes the state machine clean and the expiry worker simple. - Use two separate PostgreSQL databases. Hotels DB holds hotel metadata, room types, rooms, and the inventory counter table. Bookings DB holds reservations and confirmed bookings. Separating write domains prevents the high-concurrency booking path from competing with admin writes for connections and lock tables.
- Never run availability queries against the reservations table at 10,000 searches/second. Use a
room_type_inventorytable with one row per (room_type_id, date) storingtotal_roomsandbooked_rooms. Availability isMIN(total_rooms - booked_rooms)across the date range. Mirror these counts into Elasticsearch for the search path. - The binding constraint for multi-night availability is the tightest single night. Use
MIN(total_rooms - booked_rooms)over the date range. A 7-night stay where one night has 0 rooms available must return unavailable even if every other night has 5 rooms free. SUM or AVG gives a misleading positive result. - Prevent double-booking with SELECT FOR UPDATE on inventory rows + SKIP LOCKED on room assignment. Lock the
room_type_inventoryrows for the date range first. Then assign a specific physical room usingSKIP LOCKED, which skips rooms already locked by a concurrent transaction so two users each get a different room without blocking. - Always sort stay_date ASC before acquiring FOR UPDATE locks. Two concurrent transactions locking the same inventory rows in different date orders can deadlock. Consistent ascending order eliminates this class of deadlock with no coordination overhead.
- Idempotency keys are required on POST /reservations. Client-generated UUID v4, cached in Redis for 20 minutes. If a network timeout causes the client to retry, return the original reservation response instead of creating a second hold. Without this, every retry during a slow network creates a new reservation for a new room.
- The checkout hold timer lives in PostgreSQL, not Redis. A
reservationsrow with an indexedexpires_atcolumn is the authoritative hold. Redis stores ahold:room:{room_id}:{date}key as a pre-filter that short-circuits most requests before the database. If Redis is unavailable, the booking path falls back to database-only. Correctness must not depend on Redis uptime. - The Expiry Worker must be idempotent. Use
DELETE FROM reservations WHERE reservation_id = :id AND expires_at < NOW()and check rows_affected before re-decrementingbooked_rooms. If rows_affected is 0, another worker instance already handled this reservation. Roll back without touching inventory. - The 100:1 read/write ratio is the single most important number in this design. Search traffic (100 parts) must never touch the same database as booking writes (1 part). Elasticsearch handles search. PostgreSQL handles writes. This separation prevents a search spike from degrading booking write latency, which is the failure mode that loses revenue.
- At 1,000 bookings/second and 50M DAU, a single Bookings DB PostgreSQL primary with connection pooling (PgBouncer) handles the write load. Read replicas serve booking history reads. The Hotels DB scales with read replicas for room metadata lookups. Elasticsearch handles all 10,000 search requests/second with horizontal sharding.
- Partial payment failure is handled by storing payment_intent_id before committing the booking row. If the database write fails after a successful Stripe charge, a reconciliation job scans for payment intents with no corresponding booking row and either retries the write or issues a refund. Never assume a successful charge means the write will succeed.
- Elasticsearch availability counts must be slightly pessimistic. Decrement
min_availablein Elasticsearch when a reservation is created, not when it is confirmed. Re-increment when the reservation expires. This prevents search from showing "1 room available" to new users while 3 concurrent checkouts each hold that last room. - The two hardest interview moments for this question: explaining why
MIN(total_rooms - booked_rooms)over a date range is the correct availability check (and why it handles the binding-night constraint); and explaining why Redis alone is not a safe hold mechanism (no durability on crash, double-booking risk on failover, no inventory event on TTL expiry). Both of these distinguish candidates who have worked through the problem from those reciting patterns.