Skip to main content

System Design Questions

Design a Push Notification Service — System Design Interview Guide

Design a Push Notification Service is a system-design interview that asks you to build a multi-platform fanout system: applications submit a notification (with target users or a topic) and your service delivers it to every recipient's iOS, Android, and web client within seconds. The hard part is device-token lifecycle (tokens expire, devices change owners, users uninstall), topic fanout at scale (a celebrity announcement targets 10M devices), and the at-least-once vs. best-effort delivery decision per notification class.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Meta
  • Amazon
  • Google
  • Microsoft
  • Uber

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Register a device: store (user_id, device_token, platform, app_version)
  • Send a notification to a single device, a user (all their devices), or a topic (everyone subscribed)
  • Support per-platform payload variations (iOS payload structure differs from Android)
  • Schedule notifications for future delivery
  • Honor per-user preferences (do-not-disturb hours, channel-level opt-out)
  • Provide a delivery report: was the platform handoff acknowledged?

Non-functional requirements

  • Scale: 1B registered devices, 10B notifications/day, peak 1M notifications/sec (breaking news event)
  • Latency: <5 seconds p95 from submission to platform handoff (provider receives the message)
  • Delivery semantics: at-least-once for transactional (login codes, order updates); best-effort for marketing
  • Availability: 99.95% on the submission API; the upstream provider determines actual delivery to the device
  • Topic fanout: a single submission to a 10M-device topic must complete platform handoff within 60 seconds

Capacity estimation

Public assumptions: 1B registered devices across all customer apps, ~10B notifications sent per day = ~115K/sec sustained. Peak during a major event (election result, sports outcome) hits 1M/sec for a 10-minute window.

Device registry: 1B devices × ~300 bytes per row (user_id, device_token, platform, app_version, locale, last_seen_at, preferences blob) = 300 GB. Sharded by user_id; secondary index on device_token for inbound platform feedback (token-invalidated callbacks from the provider).

Topic subscriptions: assume an average of 50 topic subscriptions per device. 1B devices × 50 = 50B subscription rows. Each row ~50 bytes = 2.5 TB. Sharded by topic_id (so a topic fanout is a single shard's range scan).

Notification log: 10B notifications/day × ~500 bytes audit row (notification_id, target, status, timestamps) × 90-day retention = 450 TB. Streamed to a log store rather than transactional database; queried only for debugging and analytics.

Device-token lifecycle: tokens expire constantly. A typical app sees 1-3% of tokens go invalid per day (app uninstall, user logout, OS-level token rotation). On 1B devices that's 10-30M token invalidations per day = 100-350/sec. The platform providers return 'token invalid' on send attempts; the service ingests these callbacks and prunes the registry. Failing to prune leads to wasted send attempts and rising provider error rates.

Provider rate limits: the platform providers (APNS, FCM, web push) impose connection-level rate limits. At 1M/sec peak we'll need hundreds-to-thousands of concurrent HTTP/2 connections per provider, distributed across many worker hosts and IP ranges. Stay under the per-connection rate limit (a few hundred messages per second per connection) by horizontal fanout, not by stuffing more messages into a single connection.

Fanout amplification: a single topic-broadcast submission of '1 KB payload to a 10M-device topic' becomes 10M individual provider sends — a 10,000,000x amplification. The fanout worker pool must be sized for the peak topic event, not the steady-state submission rate.

High-level design

Four core services: ingest, fanout, sender workers, registry.

Ingest API accepts notification submissions. Each submission specifies: target (device_id, user_id, or topic), payload (per-platform variants), delivery class (transactional vs. marketing), optional schedule time, optional TTL (drop if not delivered within X minutes). The API authenticates the caller, validates the payload, assigns a notification_id, and writes the submission to a durable queue partitioned by target type.

Fanout service expands a single submission into per-device send tasks. For a single-device target, fanout is trivial — one row in, one task out. For a user target (a user has 3 devices), fanout produces 3 tasks. For a topic target (10M subscribers), fanout streams subscriber rows from the subscription store and emits send tasks in batches. The fanout service shards by target type and runs as a horizontally-scalable consumer.

The key invariant: fanout is idempotent. A duplicate fanout run (e.g., after a worker restart) produces the same set of send tasks with the same task_ids — the downstream sender deduplicates by task_id. This lets the fanout service crash-and-recover without producing duplicate notifications.

Sender workers consume per-device send tasks and hand them to the platform provider (APNS for iOS, FCM for Android, web-push protocol for browsers, optional first-party SMS/email gateway for fallback). The sender holds long-lived HTTP/2 connections to each provider, multiplexing many sends over each connection. On success, the sender writes a delivery record to the log store and ACKs the task. On failure, the sender categorizes the error: (a) retriable transient (network blip, provider 5xx) — re-queue with backoff; (b) terminal token-invalid (user uninstalled the app) — emit a token-prune event for the registry; (c) terminal payload-rejected (the payload violated provider rules) — log and discard.

Device registry holds (user_id, device_token, platform, preferences) keyed by device_id. Writes come from app-init (a device starts the app, sends its current platform token to your service) and from token-prune events (sender detected an invalid token). Reads come from fanout (target → list of (device_token, platform) pairs).

Scheduling: notifications with a future schedule_time go into a separate scheduled queue keyed by send_time. A dispatcher periodically (every few seconds) pulls due notifications from this queue and pushes them into the regular fanout pipeline. The scheduled queue is durable; an outage doesn't lose scheduled sends, only delays them.

Preference enforcement: each device row holds per-channel opt-outs and do-not-disturb hours. Fanout consults these before emitting a send task; if the device has opted out of the channel or is in DND, the task is dropped (with an audit record). Marketing notifications get checked against opt-outs; transactional notifications (login codes) bypass DND but still honor channel-level opt-outs.

Deep dive — the hard problem

Three deep dives: device-token lifecycle, topic fanout for 10M+ subscribers, and the delivery-guarantee tier with at-least-once vs. best-effort decisions.

Device-token lifecycle. Tokens are issued by the platform (APNS, FCM, web push) to the client app on each app launch or when the app installs an updated version. The client must call into your service to register the new token; if it forgets or fails, the old token becomes stale and the next send attempt fails with token-invalid.

Four token-state transitions to handle: new device (no prior token), token-refresh (same device, new token), device-replaced (user got a new phone, old token is dead, new token is active under same user), and uninstall (token permanently dead, no replacement).

The registry stores tokens keyed by device_id (a stable client-generated UUID, not the token itself). When the client registers, it sends (device_id, current_token). The service upserts: if device_id exists with a different token, the old token is marked invalid (the new device or new-app-install replaced it); if device_id exists with the same token, no-op; if device_id is new, insert. This keeps a single source of truth per device and handles token rotation cleanly.

Provider feedback closes the loop: when a sender attempt fails with token-invalid, the sender emits a token-prune event keyed by the failing token. The registry's prune worker looks up the token (using the secondary index on device_token), marks that device row as inactive, and stops sending to it. Without this loop, the registry accumulates dead tokens and the provider error rate climbs — eventually the provider rate-limits or blocks the sender for poor token hygiene.

Topic fanout for 10M+ subscribers. A single topic-broadcast submission must complete in <60 seconds for 10M devices = ~170K sends/sec for a one-minute window. The pattern.

Step 1: fanout planner shards the subscriber list. The subscription store is sharded by topic_id, so all subscribers to topic T live on one shard. The planner reads the shard, splits the subscriber list into batches of, say, 10K, and emits one fanout-batch task per batch. 10M subscribers / 10K per batch = 1,000 fanout-batch tasks.

Step 2: fanout-batch workers consume batches in parallel. Each batch worker hydrates the (device_token, platform) for its 10K device_ids (a single batched read from the registry), then emits per-device send tasks to the sender queue. The batch worker pool is sized large enough that 1,000 batches complete in seconds.

Step 3: sender pool processes per-device send tasks. With 100+ sender hosts each holding ~50 concurrent HTTP/2 connections to providers, the aggregate throughput exceeds 170K sends/sec.

Key design choice: don't try to fanout 10M sends from one place. The whole pipeline is built around 'one logical submission becomes N batches becomes N×K tasks', with each stage scaling horizontally. This is the same pattern as the celebrity-fanout problem in Design Twitter — same architectural shape, different domain.

Delivery-guarantee tier. Not all notifications deserve the same delivery semantics, and trying to give everything at-least-once is expensive (more retries, more provider load, more bandwidth).

Transactional class — at-least-once. Login codes, payment confirmations, order shipped. These notifications matter enough that delivering twice is acceptable but failing to deliver is not. Sender retries on transient errors with exponential backoff up to a TTL (commonly 5-15 minutes); the platform handles at-most-once on its side, so 'at-least-once at the platform handoff' is the strongest guarantee your service can offer. The client app deduplicates by notification_id to avoid showing duplicates if both attempts succeed.

Marketing class — best-effort. Promotional pushes, newsletter notifications. These have low cost-of-loss and high cost-of-duplication (a user who sees the same promo twice unsubscribes). Sender attempts once; on transient error, drops. Retries would amplify provider load and increase duplicate risk.

Near-real-time class — at-least-once with short TTL. Sports score updates, breaking news. Matters enough to retry, but only briefly — a score update from 5 minutes ago is useless. TTL is 30-60 seconds; after that the notification is dropped even if undelivered.

The class is encoded in the submission and respected at every stage: fanout, sender, retry policy. Mixing classes (treating a marketing send like a login code) is the most common architectural mistake — it inflates load and breaks the platform's rate-limit headroom for the things that actually matter.

Fourth surface: rate limiting and provider abuse. Sending too many notifications to one user (10/minute) gets the sender flagged by the platform. The service enforces per-user rate limits at fanout — drop or batch consecutive notifications to the same user within a short window. This is also a UX win: users hate notification spam.

Common mistakes

  • Storing the device token as the primary key instead of a stable device_id — every token rotation appears as a 'new device'
  • Skipping the token-prune feedback loop — registry rots with dead tokens, provider error rates climb
  • Trying to fanout a 10M-subscriber topic from a single worker — sequential sending takes hours
  • Treating all notifications as at-least-once — marketing duplicates drive opt-outs and inflate provider cost
  • Forgetting that the third-party provider is a black-box dependency — your service is responsible only up to the platform handoff

Likely follow-up questions

  • How would you implement 'rich notifications' with images, action buttons, and interactive replies?
  • What changes if you need to deliver notifications even to devices that have been offline for days (sync on next app open)?
  • How would you support a 'silent notification' that wakes the app to fetch data without showing a visible alert?
  • How would you let an app A/B-test notification copy with two variants delivered to a 50/50 user split?
  • How would you detect that one of your customer apps is sending spam (high opt-out rate) and rate-limit them globally?

Practice Design a Push Notification Service live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

Why can't I just rely on the platform provider's API directly?
You can — and apps with <100K users often do. The push notification service exists to solve the problems that emerge at scale: token-lifecycle hygiene, topic fanout, multi-platform unified API, scheduling, preferences, audit, and shared connection-pool reuse across many tenants.
Do I need to design the platform provider itself?
No. Treat APNS, FCM, and web push as black-box dependencies — they have their own QoS, rate limits, and delivery semantics, and your service is responsible only up to the handoff. Drilling into the provider's internals is out of scope unless the interviewer specifically asks.
Is the topic-fanout pattern different from social-feed fanout?
Same shape, different domain. Social feed fanout is one post → N follower feeds (in your service). Topic notification fanout is one notification → N subscriber devices (out to the provider). The pipeline stages — planner, batch worker, per-device worker — are identical.
What is the senior signal for this question?
Three: (1) you separate at-least-once from best-effort delivery and explain when each applies; (2) you handle the device-token lifecycle with a prune feedback loop; (3) you decompose topic fanout into batched workers rather than sequential iteration. Missing the delivery-class distinction is the most common hire-reduce signal.