Ad Platform
Walk through a complete ad platform design, from a basic campaign CRUD service to a two-stage retrieval-scoring pipeline serving the best ad per impression in under 100ms at 10B daily impressions, with smooth budget pacing and reliable attribution.
What is an ad management and serving system?
An ad management and serving system connects advertisers with users by selecting the best-matching creative per impression in under 100ms. The engineering challenge is the selection pipeline: thousands of campaigns compete for every impression and the winner must be chosen by relevance, bid, and remaining budget in single-digit milliseconds. This question tests candidate-retrieval data structures, budget pacing with distributed counters, high-throughput event ingestion, and delayed attribution, making it one of the richest hard-tier system design questions.
Functional Requirements
Core Requirements
- Advertisers can create campaigns with targeting criteria (demographics, interests, keywords), daily and total budgets, a schedule, and creative assets (image, headline, CTA, destination URL).
- When a user's feed loads, the system selects the best-matching ad(s) to show in under 100ms.
- Ad impressions and clicks are recorded and attributed to campaigns without data loss.
- Advertisers can view spend, impressions, clicks, and conversion dashboards, updated within minutes.
Below the Line (out of scope)
- ML model training for click-through-rate (CTR) prediction.
- Real-time bidding (RTB) with external demand-side platforms (DSPs).
- Billing, invoicing, and payment processing.
- Content review and brand safety moderation.
The hardest part in scope: Selecting the best ad per impression in under 100ms. With tens of thousands of active campaigns and complex targeting predicates, a naive database scan fails immediately. The selection pipeline needs a candidate retrieval stage (milliseconds, coarse) followed by a scoring stage (ranks by expected revenue per impression).
ML model training is below the line because it is a data-infrastructure problem separate from serving. The serving layer consumes a pre-trained CTR model as a black box. To add training, I would ship impression and click events to a feature store (Feast), run periodic training jobs with Spark or XGBoost, and deploy model artifacts to a model registry read by the scoring service.
RTB is below the line because it introduces sub-50ms latency budgets, the OpenRTB protocol, and external DSP integrations. To add RTB, the selection response from the first-party serving layer would be submitted as a floor price in an exchange auction alongside bids from external DSPs.
Billing does not change the serving path and is handled by a standalone billing service. A billing service would accumulate spend debits against a prepaid credit ledger and pause campaigns when the balance reaches zero.
Content review is below the line because moderation requires human reviewers or a specialized ML classifier and does not affect the serving hot path. To add it, new campaigns would route through a content classification service before transitioning from pending_review to active.
Non-Functional Requirements
Core Requirements
- Selection latency: Ad selection completes in under 100ms p99. The user's feed must not block on ad selection.
- Scale: 10 billion impressions per day, approximately 120,000 impression requests per second at peak. Click rate is typically 1 to 2 percent, giving 100 to 200 million clicks per day.
- Budget accuracy: A campaign's daily spend never exceeds its budget by more than 5%. Overspend protection is a contractual obligation with advertisers.
- Event durability: Impressions and clicks are captured with at-least-once delivery. Kafka retains events for 7 days for replay; acceptable duplicate rate after deduplication is under 0.01%. Losing an event is worse than recording it twice.
- Reporting freshness: Campaign dashboards reflect spend and engagement within 5 minutes. Exact real-time is not required; billing reconciliation runs hourly.
- Availability: 99.99% uptime for ad selection. Missing an impression is a direct revenue loss event. The serving path needs active-active multi-region deployment.
Below the Line
- Sub-10ms ad selection (would require on-device ML and edge-cached candidate sets)
- Exactly-once event delivery end-to-end (at-least-once with idempotent deduplication is sufficient and far cheaper)
Read/write ratio: For every campaign created, tens of millions of impression requests are evaluated against it. The serving path dwarfs campaign management by a factor of 10 million or more. Optimize for serving latency and throughput; campaign CRUD is a rounding error.
Under 100ms selection latency means a full database scan over all active campaigns is never viable at 120K requests per second. Even a 1ms per-request overhead against a single database creates a bottleneck no single instance can absorb. The selection tier must operate from in-memory data structures pre-loaded from the campaign store.
Budget accuracy at 120K impressions per second requires a distributed spend counter that can be decremented atomically without a centralized lock. This directly drives the budget pacing deep dive.
Core Entities
- Campaign: A business's advertising unit: budget (daily and total), schedule, targeting criteria, and a list of associated ads.
- Ad: A single creative unit (headline, image URL, destination URL, CTA) belonging to one campaign.
- Impression: A recorded event of an ad being displayed to a user. The primary billing and reporting unit.
- Click: A recorded event of a user clicking on an ad, referencing the originating impression ID.
- Conversion: A recorded event (purchase, sign-up) after a click or impression, reported by the advertiser's pixel or S2S postback.
- User: A platform user with behavioral signals, owned by the user profile service; the targeting layer queries it on each selection request.
The primary relationship is Campaign (1) to Ad (many). Impressions reference both the Ad shown and the User who saw it. Clicks reference an Impression. Conversions reference either a Click (click-through) or an Impression (view-through). Full schema and indexing are deferred to the deep dives.
API Design
One endpoint per core functional requirement, grouped by the requirement it satisfies.
FR 1: Create a campaign
POST /campaigns
Authorization: Bearer {advertiser_token}
Body: {
name: string,
daily_budget_cents: number,
total_budget_cents: number,
start_date: "YYYY-MM-DD",
end_date: "YYYY-MM-DD",
targeting: {
age_range: [25, 45],
interests: ["travel", "photography"],
geo: { country: "US", regions: ["CA", "NY"] },
keywords: ["mirrorless camera"]
}
}
Response 201: { campaign_id: "cmp_7x9q2k", status: "pending_review" }
The campaign starts in pending_review to allow a lightweight content check before it enters the serving pool. No ads are served from a pending_review campaign.
Create an ad inside a campaign:
POST /campaigns/{campaign_id}/ads
Authorization: Bearer {advertiser_token}
Body: {
headline: string,
image_url: string,
destination_url: string,
cta_text: string,
bid_cents: number
}
Response 201: { ad_id: "ad_4f8vja" }
bid_cents is the advertiser's maximum cost-per-click. Combined with predicted CTR, it drives the eCPM score used for selection: eCPM = bid_cents * predicted_ctr * 1000.
FR 2: Ad selection (the serving endpoint)
GET /ads/select?user_id=u_abc&placement=feed&count=1
Authorization: Bearer {platform_token}
Response 200: {
ads: [{
ad_id: "ad_4f8vja",
impression_id: "imp_9z3x1q",
headline: "Capture every moment",
image_url: "https://cdn.example.com/creatives/ad_4f8vja.jpg",
destination_url: "https://advertiser.example.com/cameras",
cta_text: "Shop now"
}]
}
The impression_id is generated server-side before responding so the client can fire an impression event using this same ID without a round trip. This separates selection latency from event recording latency.
FR 3: Record impressions and clicks
POST /events/impression
Body: { impression_id, ad_id, campaign_id, user_id, timestamp, placement }
Response 200: { ok: true }
POST /events/click
Body: { impression_id, ad_id, campaign_id, user_id, timestamp }
Response 200: { ok: true }
The client fires both endpoints as non-blocking fire-and-forget calls. Clicks include impression_id so the attribution pipeline can link the click back to the impression that preceded it. Implementation is a non-blocking Kafka publish, not a synchronous DB INSERT; I cover the details in the deep dive.
FR 4: Campaign reporting
GET /campaigns/{campaign_id}/stats?start_date=...&end_date=...&granularity=day
Authorization: Bearer {advertiser_token}
Response 200: {
impressions: 4200000,
clicks: 63000,
ctr: 0.015,
spend_cents: 441000,
conversions: 840,
data: [{ date, impressions, clicks, spend_cents }]
}
Data is served from pre-aggregated rollup tables, not a live query over billions of raw event rows. Freshness is within 5 minutes per the NFR.
High-Level Design
The system has four distinct flows, each mapping to one functional requirement. I build the design incrementally: start with the simplest component pair, then add exactly what each new requirement demands.
1. Advertisers can create campaigns with targeting and creative assets
The write path is straightforward. An advertiser submits a campaign definition; the Management Service validates it, stores it in the Campaign DB, and enqueues a review job. Once approved, the campaign becomes eligible for serving.
Components:
- Management Service: Validates campaign and ad payloads, enforces budget minimums, writes to Campaign DB.
- Campaign DB: PostgreSQL. Stores campaigns, ads, targeting criteria, budget configurations, and approval states. Low write volume (hundreds per minute, not per second).
- Review Queue: A lightweight async queue. Campaign review is a human or rule-based process that runs outside the critical path.
Request walkthrough:
- Advertiser sends
POST /campaignswith targeting rules and budget. - Management Service validates the payload (budget > 0, dates are valid, targeting schema is correct).
- Management Service inserts the campaign into Campaign DB with
status = pending_review. - Management Service enqueues a review task. Returns
campaign_idto the advertiser. - Review process approves the campaign, updating
status = active. - Campaign Sync Service (see next section) picks up the active campaign and loads it into the serving layer's in-memory index.
Creative assets (images, video) are uploaded directly to S3 by the advertiser and served globally via a CDN (CloudFront or Fastly). When an advertiser creates an ad, the image_url field contains the CDN URL, not the S3 origin URL. The Selection Service returns this CDN URL verbatim in the ad selection response; the client fetches the image directly from the nearest CDN edge node without touching the ad platform. A 500KB creative served from the nearest point of presence loads in under 20ms for 95% of users globally.
This covers campaign management and creative asset serving. Ad selection does not touch Campaign DB on the hot path; the serving layer reads from a pre-built in-memory index.
2. User's feed loads and the system selects the best ad in under 100ms
This is the hard part. Let me start with the naive approach and show why it fails before evolving to the correct design.
Naive approach: query Campaign DB directly
Components:
- Selection Service: Receives the ad select request with user context.
- Campaign DB: Scanned to find matching active campaigns.
Request walkthrough:
- Feed service sends
GET /ads/select?user_id=u_abc&placement=feed. - Selection Service queries Campaign DB:
SELECT * FROM campaigns WHERE status='active' AND geo matches AND interest_tags overlap AND budget_remaining > 0. - Selection Service scores the results and returns the top ad.
What breaks: Campaign DB has 50,000 active campaigns. A SELECT with an interests && ? array overlap operator and a remaining_budget > 0 check scans every row. At 120,000 requests per second, each query is 10 to 50ms on a warm DB. The database saturates well below this rate; no amount of indexing saves a full-table relevance scan under this load.
The fix: pre-built in-memory candidate index + two-stage selection
The key insight is that the candidate set for any given user is small (a few hundred campaigns) but the full campaign set is large (tens of thousands). We separate the problem into two steps: candidate retrieval (coarse, fast) and scoring (fine, computed on the small candidate set).
Evolved components:
- Campaign Sync Service: Watches Campaign DB for changes, rebuilds the candidate index in Redis and a local inverted index in each Selection Service instance.
- Selection Service (evolved): Loads the inverted index into memory at startup. Candidate retrieval is a lookup, not a query. Scoring runs in pure CPU compute against a candidate set of 200 to 500 ads.
- Ad Candidate Cache (Redis): Stores the latest targeting index as a set of precomputed audience segments mapped to eligible campaign IDs. Read-only on the hot path.
Evolved request walkthrough:
- Feed service sends
GET /ads/select?user_id=u_abc&placement=feed. - Selection Service fetches the user's audience segment IDs from the User Profile Service (cached in Redis, under 5ms).
- Selection Service performs an in-memory intersection against the local inverted index to retrieve 200 to 500 candidate campaign IDs.
- For each candidate, Selection Service computes
eCPM = bid_cents * predicted_ctr * 1000, filters campaigns with zero remaining budget, and ranks by eCPM. - Selection Service returns the top-ranked ad with a server-generated
impression_id. - Total time: under 15ms (5ms user profile lookup + under 10ms in-memory retrieval + scoring).
I treat eCPM scoring and the inverted index structure as a black box here. The full mechanism, including how the index is built, how budget filtering is applied, and how predicted CTR is looked up, is covered in Deep Dive 1.
3. Impressions and clicks are recorded without data loss
Naive approach: synchronous DB INSERT per event
Each POST /events/impression triggers a direct INSERT into an Impressions table. At 120,000 impressions per second, a standard PostgreSQL instance handles roughly 5,000 to 20,000 inserts per second. The ingestion endpoint becomes the system's bottleneck and failure point.
The fix: fire-and-forget to Kafka, consumer writes to analytics store
The key insight is that recording an event does not need to be synchronous. The client does not need the row inserted before the response returns. We publish to Kafka (under 5ms) and let consumers write durably downstream.
Evolved components:
- Event Ingestion Service: Thin HTTP receiver. Validates the event, publishes to a Kafka topic, returns 200. No DB access on the hot path.
- Kafka (Events Topic): Partitioned by
campaign_id. Provides durable, ordered, at-least-once delivery. Consumers can replay from any offset. - Event Consumer: Batch-reads from Kafka, deduplicates by
impression_id, writes to the analytics store. - Analytics Store (ClickHouse): Column-oriented, optimized for aggregation queries. Ingests at millions of rows per second via bulk inserts.
Request walkthrough:
- Client fires
POST /events/impression(non-blocking; user sees no delay). - Event Ingestion Service publishes
{ impression_id, ad_id, campaign_id, user_id, timestamp }to theevents.impressionsKafka topic. - Ingestion Service returns 200 immediately.
- Event Consumer reads batches from Kafka, deduplicates on
impression_id, bulk-inserts into ClickHouse. - Reporting aggregator (Flink job) computes 5-minute rollups per campaign, writes aggregated rows to the rollup table queried by the dashboard.
4. Advertisers view campaign dashboards updated within minutes
This flow reads from the rollup tables populated in the previous step.
Components:
- Reporting Service: Reads pre-aggregated rollup tables. No live queries over raw event data. Request walkthrough:
- Advertiser sends
GET /campaigns/{id}/stats?start_date=...&end_date=.... - Reporting Service queries the rollup table:
SELECT SUM(impressions), SUM(clicks), SUM(spend_cents) FROM campaign_rollups WHERE campaign_id = ? AND date BETWEEN ? AND ?. - Reporting Service returns aggregated metrics. No raw event scan required.
The dashboard endpoint is a simple SELECT SUM(impressions), SUM(clicks), SUM(spend_cents) FROM campaign_rollups WHERE campaign_id = ? AND date BETWEEN ? AND ?. No complexity here; the hard work happened at write time. I will not add a separate diagram for this step since the Flink rollup path was already shown in the diagram above.
Potential Deep Dives
The High-Level Design established that the system works. These deep dives address the hard mechanisms deferred above: the selection pipeline internals, budget pacing, the attribution model, and the event pipeline at full scale.
1. How do we select the best ad for a user in under 100ms?
Three constraints drive the design: the candidate space is large (50,000 active campaigns), the user context changes on every request, and the hard latency ceiling is 100ms. The selection mechanism must handle all three simultaneously.
2. How do we pace budget so a campaign does not exhaust its daily spend in the first hour?
Without pacing, a campaign with a $100 daily budget and high targeting overlap would receive all 100M eligible impressions in the first few minutes, spend its entire budget by 8am, and go dark for the rest of the day. Advertisers pay for reach across the day, not a burst.
3. How do we capture billions of impression and click events without data loss?
The event pipeline is a write-heavy path: 120,000 impression events per second at peak, plus 1 to 2 million click events per day. The requirement is at-least-once delivery with idempotent downstream processing.
4. How do we attribute a conversion back to the ad that drove it?
A user clicks an ad for a camera, browses the advertiser's site, and purchases 3 days later. Which ad gets credit? Attribution resolves this by linking the conversion event back to one or more prior ad interactions.
Final Architecture
The two-stage selection pipeline is the architectural core: in-memory candidate retrieval (under 3ms) followed by eCPM scoring (under 1ms) keeps ad selection at under 15ms, far below the 100ms SLA. Budget token buckets in Redis protect advertiser budgets without touching the Campaign DB on the hot path. The event pipeline decouples high-throughput ingestion from analytics: Kafka absorbs 120K events per second, Flink deduplicates and aggregates, and ClickHouse serves sub-second reporting queries over billions of rows.
Interview Cheat Sheet
- Start by separating the two design problems: campaign management (low-volume write CRUD) and ad selection (high-volume read, 120K requests per second). They have almost nothing in common architecturally.
- State the core latency budget upfront: 100ms total for ad selection means the selection service itself must stay under 15ms to leave headroom for network, feed assembly, and gateway overhead.
- A naive DB scan of 50,000 campaigns per impression request does not work. The fix is a two-stage pipeline: in-memory inverted index for candidate retrieval (segment intersection, microseconds) followed by eCPM scoring (pure compute, no I/O).
- eCPM is the ranking signal:
eCPM = bid_cents * predicted_ctr * 1000. Advertisers who bid more but have poor CTR lose to lower-bidding ads with higher relevance. This keeps ad quality high and protects user experience. - The CTR model is a black box on the serving hot path; it runs offline, and its outputs are cached in a lookup table by
(ad_id, user_segment_bucket). No real-time inference on the serving path. - Budget pacing uses a Redis token bucket per campaign, refilled by a Pacemaker Service every second at a rate proportional to the traffic weight of the current hour. Smooth pacing prevents budget exhaustion in the first hour of the day.
- Pacemaker is single-writer, leader-elected. If the Pacemaker misses a tick, the campaign under-serves slightly rather than over-spending. Under-serving is always safer than overspending.
- Creative assets (images, video) are stored in S3 and served via a CDN (CloudFront or Fastly). The Selection Service returns the CDN URL; the client fetches the asset from the nearest edge node. A 500KB image loads in under 20ms for 95% of users globally.
- Event ingestion is fire-and-forget: the client publishes to Kafka and gets 200 back in under 5ms. Kafka provides at-least-once delivery; Flink deduplicates on
impression_idin a 10-minute window downstream. - Use ClickHouse for analytics storage, not PostgreSQL. ClickHouse is column-oriented and ingests millions of rows per second. A
SELECT SUM(impressions) WHERE campaign_id = ?over billions of rows returns in under 100ms. - Attribution windows: click-through (7 days) and view-through (1 day). For multi-touch, time-decay attribution gives more credit to recent touchpoints while crediting upper-funnel campaigns that built awareness.
- Cross-device attribution requires a deterministic identity graph (email hash links devices at login). Without it, a mobile click is invisible when the user converts on desktop.
- The Campaign DB is never on the hot path for selection. The serving layer reads from an in-memory index. The Campaign DB is touched only for management operations and as the source of truth for index rebuilds.