Google Calendar
Design a scalable calendar service that handles event creation, recurring schedules, free/busy queries, and real-time UI updates for hundreds of millions of users.
What is a calendar service?
A calendar service stores events and schedules for users and helps them coordinate with others. The apparent simplicity hides two hard problems: (1) modeling recurring events without storing one database row per occurrence (a weekly meeting for two years is 104 occurrences but only one logical rule), and (2) efficiently querying "show me this user's events between Monday and Sunday" when recurring events must be expanded dynamically at query time. These two problems, recurrence modeling and range query efficiency, drive every significant architectural decision in this design.
I'd frame this question to the interviewer by saying: "The interesting part is not CRUD on events. The interesting part is what happens when you type FREQ=WEEKLY into a recurrence rule and then try to query a date range six months away." That framing immediately signals that you see the real problem.
Functional Requirements
Core Requirements
- Users can create, update, and delete calendar events (title, start time, end time, location, description).
- Events can recur on a schedule (daily, weekly, monthly, custom RRULE per the iCalendar spec).
- Users can invite others to events; invitees can accept or decline.
- Users can query their calendar for a time range (for example, all events in January).
Below the Line (out of scope)
- Video conferencing integration (Zoom/Meet links)
- Smart scheduling AI (find free time for all attendees)
- Calendar sharing and delegation (view another user's full calendar)
The hardest part in scope: Recurring events. Storing the rule once vs expanding all instances is the central schema design tension, and it cascades into every downstream decision: range queries, "update all future occurrences" operations, per-occurrence exceptions, and free/busy computation.
Video conferencing integration is below the line because it does not change the event storage or query path. To add it, store a conference_url field on the event and call the provider API (Zoom or Meet) on event creation to generate the link. The URL is metadata; it does not affect recurrence logic.
Smart scheduling AI is below the line because it is a separate read service that sits above the free/busy layer. To add it, query all attendees' free/busy bitmaps, compute the intersection, and return suggested time slots. The underlying free/busy data we build in this design is already the required input.
Calendar sharing and delegation is below the line because it requires a permission model (owner, viewer, editor) and row-level access checks on every calendar query. To add it, maintain a calendar_permissions table mapping (owner_id, grantee_id, permission_level) and gate every query on that table. The event schema does not need to change.
Non-Functional Requirements
Core Requirements
- Scale: 500M users, each averaging 50 events per month, gives roughly 25B events total. 100M DAU querying their calendars at peak produces around 10M range queries per second.
- Latency: Calendar range queries return in under 200ms p99. Event creation completes in under 500ms p99.
- Availability: 99.99% uptime. A user who cannot see their own calendar is a critical failure.
- Consistency: Eventual consistency is acceptable for read replicas (a newly created event appearing within 1-2 seconds is fine). Writes are strongly consistent on the primary.
Below the Line
- Sub-millisecond query latency (requires hot in-memory serving for all 25B events, not practical at this scale)
- Multi-region strong consistency (cross-region replication lag is acceptable under the eventual consistency model)
Read/write ratio: Calendar is read-heavy at roughly 10 reads per write. Users browse their calendar far more often than they create events. This ratio directly determines the caching strategy: pre-computing and caching the "next 30 days" snapshot per user is viable because writes are infrequent enough to make cache invalidation cheap.
The 10:1 ratio tells you that the read path deserves most of the optimization effort. Adding read replicas, caching pre-expanded event windows in Redis, and putting an index on (user_id, start_time) all pay off immediately. The write path does not need a queue or stream processor because 1M writes per second (1/10th of reads) is well within primary database capacity with proper indexing.
I always call out the ratio early in a calendar interview because it kills the temptation to add Kafka or a message queue for writes. Unlike a top-K system or analytics pipeline, calendar writes are not the bottleneck. Spending time on write optimization here is time stolen from the read path, where it actually matters.
Core Entities
- Event: The core event record. Carries
event_id,creator_id,title,start_time,end_time,timezone,location,description, andrrule(null for one-off events, an RRULE string for recurring events). Schema details and indexes are deferred to the deep dives. - Attendee: The join between an event and a user. Carries
event_id,user_id, andstatus(pending,accepted,declined). One row per invited participant. - RecurrenceException: An override for one specific occurrence of a recurring event. Carries
base_event_id,original_occurrence_date,modified_event_id(points to a one-off event with the exception's fields), andis_deleted(true when the occurrence is cancelled rather than modified).
API Design
FR 1 and FR 2 - Create an event (with optional recurrence):
POST /events
Body: {
title: "Weekly sync",
start_time: "2026-04-07T10:00:00Z",
end_time: "2026-04-07T10:30:00Z",
timezone: "America/New_York",
rrule: "FREQ=WEEKLY;BYDAY=TU", // null for one-off events
attendees: ["user_456", "user_789"],
location: "Conference Room B"
}
Response: HTTP 201 Created
Body: { event_id: "evt_abc123" }
POST because we are creating a new resource. The rrule field follows the iCalendar RRULE spec (RFC 5545), which is the standard calendar interchange format most client libraries already understand. Returning event_id lets the client immediately open a WebSocket subscription for that event's updates.
FR 4 - Get events in a time range:
GET /calendar/{user_id}/events?from=2026-01-01T00:00:00Z&to=2026-01-31T23:59:59Z
Response: HTTP 200 OK
Body: {
events: [
{
event_id: "evt_abc123",
title: "Weekly sync",
start_time: "2026-01-06T10:00:00Z",
end_time: "2026-01-06T10:30:00Z",
is_recurring_occurrence: true,
base_event_id: "evt_abc123"
}
]
}
The response returns expanded occurrences within the requested range, including the relevant occurrence times for recurring events. is_recurring_occurrence and base_event_id let the client know which events are instances of a recurring series so it can render them correctly.
FR 3 - Update attendee status:
PUT /events/{event_id}/attendees/{user_id}
Body: { status: "accepted" }
Response: HTTP 200 OK
PUT because we are updating a specific resource at a known URL. status is one of accepted or declined. The server writes to the Attendee table and notifies the event creator asynchronously.
Real-time updates (WebSocket):
WebSocket: ws://host/calendar/{user_id}/events
Server pushes: {
event_id: "evt_abc123",
change_type: "created" | "updated" | "deleted",
occurred_at: "2026-01-01T10:00:00Z"
}
The WebSocket channel scoped to user_id receives push notifications whenever any event on that user's calendar changes (including events where the user is an attendee). The client re-fetches the full event details using the REST API on receiving the notification.
High-Level Design
1. Creating and retrieving one-off events
A single app server backed by a relational database handles basic event creation and range queries, and the index on (user_id, start_time) is what makes those queries fast.
The simplest possible system: a client sends a POST /events request, the app server validates it and writes to a PostgreSQL table, and range queries hit a B-tree index on (user_id, start_time, end_time). No caching, no replication, no recurring events yet.
Components:
- Client: Web or mobile app that sends create/read requests.
- App Server: Validates request payload, writes the event to the database, reads events for range queries.
- PostgreSQL (Primary): Stores events with a composite index on
(user_id, start_time, end_time)for efficient range queries.
Request walkthrough (create):
- Client sends
POST /eventswith title, start time, end time, and timezone. - App Server validates that
end_time > start_timeand that required fields are present. - App Server inserts the event into the
eventstable with a generatedevent_id. - App Server returns
{ event_id }to the client.
Request walkthrough (range query):
- Client sends
GET /calendar/{user_id}/events?from=...&to=.... - App Server executes:
SELECT * FROM events
WHERE user_id = :user_id
AND start_time < :to
AND end_time > :from
ORDER BY start_time;
- App Server returns the matching events.
This covers one-off events only. The next section handles recurring events, which require a fundamentally different approach.
I'd draw this minimal diagram first and tell the interviewer: "This handles 80% of real calendar usage. The remaining 20%, recurring events, is where the complexity lives." Starting simple and then layering complexity shows design maturity.
2. Recurring events
Storing the RRULE string once and expanding occurrences in the app layer at query time is the correct model, but it requires an indexed next_occurrence_date column and a RecurrenceException table to stay efficient.
The naive approach expands all instances at creation time. That design breaks in multiple dimensions (explored in Deep Dive 1). The evolved approach stores only the recurrence rule and expands in the app layer.
Components (new and changed):
- App Server (recurrence-aware): At query time, expands RRULE strings into occurrences that fall within the query window. Checks the
RecurrenceExceptiontable to apply per-occurrence overrides. - events table (updated): Adds
rrule(nullable string) andnext_occurrence_date(indexed date, computed on creation and advanced by a background job). - recurrence_exceptions table: Stores one row per modified or deleted occurrence. Keyed on
(base_event_id, original_occurrence_date).
Request walkthrough (range query with recurring events):
- Client sends
GET /calendar/{user_id}/events?from=Monday&to=Sunday. - App Server runs two queries in parallel:
- Non-recurring events in range:
WHERE user_id = ? AND rrule IS NULL AND start_time < to AND end_time > from - Recurring events that may have occurrences in range:
WHERE user_id = ? AND rrule IS NOT NULL AND next_occurrence_date <= to
- Non-recurring events in range:
- For each recurring event returned by query 2, App Server expands the RRULE in memory to find all occurrences that fall within the query window.
- For each expanded occurrence, App Server checks the
recurrence_exceptionstable. If aRecurrenceExceptionrow exists for that(base_event_id, original_occurrence_date), the app either substitutes the modified event or removes the occurrence ifis_deleted = true. - App Server merges the two result sets and returns them sorted by
start_time.
The next_occurrence_date index is the key performance mechanism. Without it, finding all recurring events that may have an occurrence in the query window requires scanning every recurring event row for every user, which is O(N) at 25B rows. With the index, the query resolves to a narrow B-tree scan.
I've seen candidates skip the next_occurrence_date column and try to filter recurring events by start_time alone. That breaks immediately: a weekly meeting created in 2020 has a start_time of 2020, but it absolutely needs to appear in a 2026 query. The indexed forward pointer is the mechanism that makes this work.
3. Event invitations and real-time updates
Adding an attendees table and a WebSocket gateway completes the collaboration layer, with a notification service handling invite delivery asynchronously.
When an event is created with attendees, the app server writes to both the events table and the attendees table in a single transaction, then publishes an invitation event to the notification service asynchronously.
Components (new and added):
- attendees table: One row per
(event_id, user_id)pair. The authenticated user accepting or declining an invite writes aPUT /events/{event_id}/attendees/{user_id}request. - Notification Service: Receives invite and update events from the app server, delivers push notifications to invitees (email, mobile push, in-app).
- WebSocket Gateway: Maintains persistent connections to clients. When an event on a user's calendar changes, the gateway pushes a lightweight change notification (
{ event_id, change_type }). The client re-fetches full details via REST. - Redis Pub/Sub: The app server publishes to a Redis channel keyed by
user_id. The WebSocket Gateway subscribes to channels for all currently connected users and forwards messages to the appropriate WebSocket connection.
Request walkthrough (create event with attendees):
- Client sends
POST /eventswithattendees: ["user_456", "user_789"]. - App Server opens a transaction: inserts the event row, inserts one Attendee row per invitee with
status = pending. - Transaction commits.
- App Server asynchronously publishes an invite message to the Notification Service.
- App Server publishes a
createdchange event on each invitee's Redis pub/sub channel. - WebSocket Gateway receives the pub/sub message and pushes
{ event_id, change_type: "created" }to any connected invitee clients.
Request walkthrough (attendee responds):
- Invitee sends
PUT /events/{event_id}/attendees/{user_id}with{ status: "accepted" }. - App Server updates the Attendee record:
UPDATE attendees SET status = 'accepted' WHERE event_id = ? AND user_id = ?. - App Server publishes a change event to the organizer's notification channel.
- WebSocket Gateway delivers
{ event_id, change_type: "updated" }to connected organizer clients.
Potential Deep Dives
1. How do you model recurring events in the database?
Three distinct approaches exist: expand all instances at creation time (simple queries, huge storage), store only the RRULE string and expand at query time (compact storage, expensive queries without optimization), or store RRULE plus an indexed next_occurrence_date plus a RecurrenceException table (correct and efficient). I would lead with the third in an interview and briefly name the others to show I considered them.
2. How do you efficiently query "find all events between time A and time B"?
The range query is the most frequent operation in the system (10M/s at peak). Getting the index design right is the difference between a 2ms query and a full-table scan.
I'd spend extra whiteboard time on the overlap predicate (start_time < window_end AND end_time > window_start). Most candidates write start_time >= window_start AND start_time <= window_end, which is wrong because it misses events that start before the window but span into it. Getting this right in the interview is a strong signal.
3. How do you compute free/busy for event invitations?
When a user invites others to a meeting, the client often wants to show a free/busy grid (which attendees are available at which times). This is a read-intensive computation over all attendees' calendars.
I'd mention the bitmap approach early if the interviewer asks about free/busy, because the jump from "query each attendee's calendar" to "1440 bits per user per day with BITOP AND" is the kind of non-obvious insight that separates a good answer from a great one.
Final Architecture
The complete system integrates event storage, recurring event expansion, the write-through free/busy bitmap, real-time WebSocket delivery, and a read replica fleet for calendar query scale.
Interview Cheat Sheet
-
Recurring event model: Store one row per event with an RRULE string (RFC 5545 iCalendar format) and an indexed
next_occurrence_datecolumn. Expand occurrences in the app layer at query time starting fromnext_occurrence_date(not the series start). Never expand all instances at creation time. -
RecurrenceException table: One row per modified or cancelled occurrence, keyed on
(base_event_id, original_occurrence_date). The row either points to a modified event (for reschedules) or setsis_deleted = true(for cancellations). Checked during RRULE expansion. -
Range query index: Composite index on
(user_id, start_time, end_time). The overlap condition isstart_time < window_end AND end_time > window_start. This correctly handles multi-day events, events overlapping the window boundary from the left, and events spanning the entire window. -
next_occurrence_date index: Separate index on
(user_id, next_occurrence_date). The recurring event range query filtersWHERE rrule IS NOT NULL AND next_occurrence_date <= window_end. This prevents scanning recurring events with no future occurrences in the window. -
Read/write ratio: 10 reads per write. Cache the "next 30 days" per user in Redis with a 5-minute TTL. Invalidate on any event write for that user. Three read replicas absorb the 10M/s read load; the primary handles only writes.
-
Free/busy computation: Store free/busy as a Redis BITFIELD keyed by
freebusy:{user_id}:{YYYYMMDD}. One bit per minute (1440 bits = 180 bytes per user per day). On event write, pipeline SETBIT for every minute of the event. Multi-attendee intersection usesBITOP ORacross all attendee keys in one Redis command, executing in sub-millisecond. -
"Update all future occurrences": Do not bulk-update individual occurrence rows. Instead, add
UNTIL=todayto the existing RRULE (ending the old series) and create a new recurring event starting from the next occurrence with the new parameters. This is the iCalendar-compatible approach and avoids multi-row transactions on a potentially large set. -
Attendee invitations: Written transactionally alongside the event creation (same database transaction). One Attendee row per invitee with
status = pending. Invite delivery to the Notification Service is asynchronous (published after commit), so a notification failure never rolls back the event creation. -
Real-time calendar updates: Redis Pub/Sub channel keyed by
user_id. The Calendar Service publishes{ event_id, change_type }after every write. WebSocket Gateway nodes subscribe to channels for their connected users and push messages over the WebSocket. The client re-fetches full event details via REST on receiving the push. -
"Update only this occurrence" vs "update all future": Three update modes map to three database operations. This occurrence only: insert a RecurrenceException row with
modified_event_id. All future occurrences: add UNTIL to old RRULE, create new series. All occurrences: update the base event row directly (modifies every future expand call). -
Timezone handling: Store all times in UTC in the database. Store the event's
timezonestring separately. RRULE expansion must use the event timezone to compute correct occurrence times across DST transitions. Free/busy bitmaps use UTC day boundaries. -
Scale: 500M users, 25B events, 10M range queries/second. Handled by: (1) composite B-tree index on (user_id, start_time, end_time), (2) Redis calendar cache per user, (3) three read replicas routing calendar queries away from the write primary.