How long is a Design Ad Server interview?

60 minutes is typical at adtech-focused companies (Google Ads, The Trade Desk, Criteo). Expect deep coverage of latency budget + pacing + frequency capping. At general-purpose interviews (Meta, Amazon) it's often 45 minutes with less emphasis on the bidding mechanics.

Do I need to understand the OpenRTB protocol?

No — naming 'open-exchange bid request format' as the integration boundary is enough. Specific fields (bidfloor, ext.prebid) are bonus signal but not required unless it's an adtech-platform-team interview.

What's the senior-bar topic?

Budget pacing under sharded counters. Specifically: explain how a campaign's spend is tracked across many shards, how pre-allocation prevents reconciliation traffic on the hot path, and how to bound overspend. If you can quantify the overspend (in dollars or percent), that's senior-plus signal.

Should I discuss machine learning for bid prediction?

Mention that the bid value is a model output (predicted CTR × bid floor × campaign value) rather than a fixed number. Drilling into model architecture is overkill unless the interviewer asks; the system-design surface is feature serving + model rollout, not training.

System Design Questions

Design a Real-Time Bidding Ad Server — System Design Interview Guide

Design a real-time bidding ad server asks you to build the system that, when a publisher's page loads, runs an auction across many advertisers, picks a winning ad, and returns the creative — all in under 100 milliseconds end-to-end. The hard part is fan-out auctions at hundreds of thousands of QPS with budget pacing and frequency capping correct to the cent.

By Sam K., Founder, InterviewChamp.AI · Last verified 2026-05-25

Reported in interviews at

Google
Meta
Amazon
The Trade Desk
Criteo

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

Accept an ad request from a publisher carrying user context (cookie ID, geo, device, page category)
Run a sealed-bid second-price auction across eligible advertiser campaigns
Return a winning creative (URL + tracking pixel) within a strict deadline (typically 100ms)
Track impression, click, and conversion events with attribution back to the winning campaign
Enforce per-campaign budget caps, daily pacing, and per-user frequency caps

Non-functional requirements

End-to-end latency: <100ms p99 from ad request to creative return
Scale: ~1M+ ad requests per second at peak (open-web RTB volumes)
Availability: 99.99%+; downtime is direct revenue loss for the publisher
Budget enforcement: a campaign with a $1000 daily cap must overspend by at most a few dollars, not hundreds
Pacing: spread spend smoothly across the day rather than burning the budget in the first hour

Capacity estimation

Anchor on open-web programmatic scale: a large ad exchange sees ~1M+ bid requests per second at peak. Each request fans out to anywhere from 50 to 500 demand-side bidders, so internal RPC volume is 50M-500M RPCs/sec at the fan-out layer. The auction tier itself processes ~1M auctions/sec.

Latency budget breakdown for the 100ms deadline: ~10ms network ingress, ~5ms request parsing + user-context lookup, ~50ms bidder fan-out (the bidders themselves are budgeted at ~40ms response time), ~10ms auction resolution + winner selection, ~10ms creative URL generation + tracking pixel construction, ~15ms egress + buffer. Every millisecond is accounted for — late bids are dropped, not waited on.

Storage estimates: ~1M bid requests/sec × 86,400 sec/day = 86B bid requests/day. Logging every bid (for billing and reconciliation) at ~1 KB per record = ~86 TB/day raw. Real systems sample aggressively for analytics (1-5% sample) and keep full data only in cold storage for the dispute window (90 days typical). Hot index of recent auctions for ~1 hour = ~3.6 TB.

User profile store: ~2B unique cookies/device IDs × ~2 KB profile (segments, interests, frequency history) = ~4 TB. Sharded by user ID with replication. Read-heavy — every ad request reads the user profile, writes happen only on segment updates.

Budget counters: ~10M active campaigns × ~50 bytes per counter (spent_today, daily_cap, hourly_pace_target) = ~500 MB. Easily in memory across a counter shard. The hard part is consistency — see deep dive.

High-level design

Five core services: ad request router, user profile service, bidder fan-out + auction engine, budget + pacing counters, and impression/click tracking.

The ad request router is the public edge. It accepts a request from the publisher's ad tag, parses the user cookie, normalizes geo + device + page context into a structured bid request, and forwards it to the auction engine. Routers are stateless and horizontally scaled behind a global anycast load balancer; they're the first place to enforce a hard 100ms timeout.

The user profile service returns the user's segments and recent ad history. It's a sharded in-memory key-value store keyed by cookie ID, with a write-behind path that updates from the analytics pipeline (segment recomputation runs offline). The read is single-digit milliseconds because it's blocking the auction.

The auction engine takes the enriched bid request and fans out to eligible bidders (internal advertiser campaigns + external DSPs in an open exchange). It applies a hard deadline — typically 50ms — and runs a sealed-bid second-price auction on whatever bids returned in time. Bidders that don't respond by the deadline are dropped, and the engine logs the timeout for bidder SLO monitoring.

The budget + pacing service holds the current spend counters per campaign and the pacing target (how much should have been spent by this point in the day). Before a campaign's bid is included in the auction, the engine checks (a) campaign hasn't exceeded daily cap, (b) hourly pacing isn't ahead. Both checks must be cheap — they're on the hot path of every auction.

Tracking ingestion: when an ad is served, the response includes an impression pixel URL and click-tracking URLs. The pixel fires on render; the click URL fires on user click. Both routes write to an event queue, which is consumed by an attribution worker that increments the campaign's spend counter and writes the impression record to a long-term ledger.

The consistency boundary that matters: the bid-time budget check reads from a low-latency in-memory counter, but the impression event that confirms the spend arrives milliseconds-to-seconds later, asynchronously. This gap is the source of all budget-overspend pain — covered in deep dive.

Deep dive — the hard problem

Two deep dives: budget pacing under high fan-out, and frequency capping with cross-shard consistency.

Budget pacing — the fundamental problem. A campaign with a $1000 daily budget might be eligible for 10M bids per second (popular targeting). Without pacing, the campaign burns through its budget in the first few minutes when traffic peaks, missing the rest of the day. The fix is a pacing target: at hour H of the day, the campaign should have spent ~H/24 of its budget (with adjustments for traffic patterns — more spend during peak hours). Before joining an auction, the engine compares spent_so_far against the pacing target and either skips the bid (if ahead) or bids normally (if behind).

The hard part is cross-shard counter consistency. Budget counters are sharded for write throughput — one shard can't handle a million increments per second. So a campaign's spend is the sum across N counter shards. Each shard tracks its own slice of impressions. The pacing check needs the total, but reading all N shards on every bid is expensive. The standard solution is local-budget-allocation: each pacing shard is pre-allocated a chunk of the campaign's daily budget (e.g., $50 per shard for a $1000/day campaign across 20 shards). Bidders only check their local shard. The pacing controller redistributes allocations every few minutes based on which shards are spending faster. Overspend is bounded by the slowest reconciliation cycle.

The overspend math: with a 1-minute redistribution cycle and a campaign delivering at ~10 impressions/sec per shard, worst-case overspend per shard is ~600 impressions × CPM, or a few dollars per shard. Across 20 shards, that's ~$50 of overspend on a $1000 campaign — 5%. Acceptable for most advertisers; tight constraints (programmatic guaranteed deals) need synchronous counter writes, accepting the latency hit.

Frequency capping — show user X this ad at most 3 times per day. Reads happen on every bid (need user's count today); writes happen on every impression. User profile service holds the counter. The wrinkle: an ad request might trigger multiple bids for the same user across multiple campaigns in the same auction. Naively, all bids see count=2 (under the cap of 3), but only one wins — so the post-bid increment is fine. The race condition appears when a user opens two pages simultaneously: both auctions read count=2, both bid, both win on their respective slot, and the user ends up seeing the ad 4 times. Fix: increment the counter at auction-win time, not at impression time, and treat impressions as the audit trail. Some platforms accept this race entirely (it's a few percent of impressions) rather than adding cross-auction coordination.

Third deep dive: the cold-start problem for new campaigns. A new campaign has no historical CTR or conversion data, so the auction engine can't predict its expected value. The standard approach is exploration: give the new campaign a small share of relevant auctions at a 'discovery' rate (random subset of eligible traffic) until it accumulates enough impressions to estimate CTR. After ~1000 impressions, the estimate stabilizes and the campaign joins the normal ranking. Mention this if asked about ML-driven auctions; it's a senior-bar topic.

Common mistakes

Treating budget counters as a strongly-consistent single row — the write QPS makes this infeasible at scale
Forgetting the bidder timeout — the auction must enforce a hard deadline, not wait for all bidders
Skipping pacing entirely — a campaign without pacing burns its budget in the traffic spike and underdelivers the rest of the day
Treating impression events as synchronous — they arrive milliseconds-to-seconds after the bid, asynchronously
Ignoring frequency-cap races across concurrent auctions — naive counter reads will overserve users by single-digit percentages

Likely follow-up questions

How would you support real-time bidding for video ads where the latency budget is even tighter?
What changes if you need to enforce a brand-safety check (no ads on adult content) on every auction?
How would you detect and filter invalid traffic (bots, fraud) before bills go out to advertisers?
How would you implement a private marketplace where only invited buyers can bid on a publisher's inventory?
How would you support header bidding where the publisher runs multiple exchanges in parallel and picks the highest?

Related system design scenarios

Frequently asked questions

How long is a Design Ad Server interview?: 60 minutes is typical at adtech-focused companies (Google Ads, The Trade Desk, Criteo). Expect deep coverage of latency budget + pacing + frequency capping. At general-purpose interviews (Meta, Amazon) it's often 45 minutes with less emphasis on the bidding mechanics.
Do I need to understand the OpenRTB protocol?: No — naming 'open-exchange bid request format' as the integration boundary is enough. Specific fields (bidfloor, ext.prebid) are bonus signal but not required unless it's an adtech-platform-team interview.
What's the senior-bar topic?: Budget pacing under sharded counters. Specifically: explain how a campaign's spend is tracked across many shards, how pre-allocation prevents reconciliation traffic on the hot path, and how to bound overspend. If you can quantify the overspend (in dollars or percent), that's senior-plus signal.
Should I discuss machine learning for bid prediction?: Mention that the bid value is a model output (predicted CTR × bid floor × campaign value) rather than a fixed number. Drilling into model architecture is overkill unless the interviewer asks; the system-design surface is feature serving + model rollout, not training.