Skip to main content

System Design Questions

Design an Online Payments System — System Design Interview Guide

Design an online payments system asks you to build the backend that merchants integrate to accept card payments: tokenize cards, authorize charges, capture funds, handle webhooks, settle to merchant bank accounts, and reconcile every cent. The hard part is exactly-once semantics with money on the line.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • PayPal
  • Block
  • Adyen
  • Plaid
  • Meta

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Tokenize a card number into an opaque token usable for future charges (PCI-compliant card vault)
  • Authorize a charge: verify funds, decrement available balance, return a charge_id with auth status
  • Capture an authorized charge (transfer the funds, often hours or days after authorization)
  • Issue webhooks to the merchant on payment events (succeeded, failed, refunded, disputed)
  • Refund and dispute (chargeback) workflows with full audit trail

Non-functional requirements

  • Exactly-once charging: a network retry must never double-charge a customer
  • Authorization latency: <500ms p99 from merchant API call to authorization decision
  • Availability: 99.99%+; payment outages are direct revenue loss for every merchant on the platform
  • Scale: ~1B+ transactions/year, peak ~10K transactions/sec on Black Friday, ~$1T+ annualized payment volume

Capacity estimation

Anchor on public scale for the largest payments platforms: ~10M+ active merchants, ~1B+ transactions/year processed = ~30 transactions/sec average, with peak load 300-1000x average during Black Friday and holiday surges (10K+ transactions/sec at the platform level). Average transaction is small (~$30-50) but the long tail includes high-value B2B payments and subscription invoices in the thousands.

Storage per transaction is ~2-5 KB (transaction row + audit log entry + webhook delivery records). Annual transaction storage: ~5 TB/year primary data + ~3-5x in audit logs and event history. The card vault (PCI-scoped, encrypted, isolated network) holds tens of millions of card-number-to-token mappings; it's tiny in bytes but expensive in compliance scope.

The shape that matters: this is not a high-QPS problem. It's a correctness problem. 10K TPS at peak is small in modern infrastructure terms. The challenge is exactly-once charging across retries, network failures, and partial component outages — every error must be deterministic, every state transition must be auditable, every charge must reconcile against the card-network-reported total at end-of-day.

High-level design

Five core domains: card vault, charge engine, webhook delivery, settlement, and reconciliation. Each is a separate service because they have different security, latency, and durability requirements.

The card vault is a PCI-scoped service. It accepts a raw card number over a TLS endpoint, encrypts it with an HSM-backed key, stores the ciphertext in a dedicated isolated store, and returns an opaque token (e.g. 'tok_abc123'). All other services see only tokens, never raw PANs. The vault is the only system in the PCI compliance scope, which dramatically reduces audit surface.

The charge engine accepts charge requests from merchants. Each request carries an idempotency_key (the merchant chooses one per logical charge). The engine writes an intent row to a sharded relational store with the idempotency_key as a unique constraint — duplicate requests with the same key return the original outcome without re-charging. The engine then calls the card network (Visa, Mastercard, etc.) via a connector service, records the network response, and commits the final charge status. Every state transition is written to an append-only event log; the current charge row is a materialized projection of the events.

Webhook delivery is a queue-backed retry service. When a charge event fires (succeeded, failed, refunded), an event is written to a queue with the merchant's registered webhook URL. A worker pool drains the queue and HTTP-POSTs to the merchant, retrying with exponential backoff on 5xx or timeout. Idempotency on the merchant side is the merchant's responsibility — every event includes a unique event_id the merchant deduplicates against.

Settlement is a daily batch process that aggregates captured charges per merchant, deducts fees, and initiates a bank-transfer payout via an ACH or wire connector. Reconciliation runs daily as well: it compares the platform's transaction ledger against the card network's settlement report and surfaces any mismatch as an alert for the operations team.

Deep dive — the hard problem

The deep dive is exactly-once semantics. Three problems compound: merchant retries, internal retries, and network ambiguity. A merchant whose request times out may retry — without idempotency, that's a double charge. An internal service that crashes mid-flight may retry — same risk. A card network call that returns 'timeout' may have actually charged the card or not, the platform can't tell from the response alone.

The standard answer is a three-layer idempotency design. Layer one: the merchant-supplied idempotency_key. The charge engine writes an intent row keyed on (merchant_id, idempotency_key) with a unique constraint; duplicate writes return the original result with no side effect. Layer two: an internal request_id propagated through the call chain so retries within the platform are deduplicated at every step. Layer three: a reconciliation pass against the card network. For ambiguous responses (timeout, network error, 'unknown'), the engine writes the charge in 'pending' state and runs a deferred status-check against the card network minutes later, then commits the true outcome. This is the only correct way to handle ambiguous network responses; mention it explicitly.

Second deep dive: the dual-write problem. When a charge succeeds, two things must happen atomically: the charge row commits AND the webhook event publishes. If these are two separate writes, one can succeed without the other (the famous 'database commits, message queue drops' failure). The standard solution is the transactional outbox pattern: write the webhook event into an outbox table in the same database transaction as the charge row, then a separate worker tails the outbox table and publishes to the queue. The outbox table provides exactly-once at-the-database level, and at-least-once delivery to the queue is acceptable because webhooks are idempotent on the merchant side.

Third deep dive: chargebacks and disputes. A chargeback is a customer-initiated reversal weeks after the original charge. The system must hold sufficient evidence (transaction timestamp, IP, browser fingerprint, billing address, shipping confirmation) for the dispute window (typically 120-180 days). Dispute responses are filed through the card network on a strict deadline. Mention the audit-trail requirement; this is what interviewers probe to test compliance awareness.

Common mistakes

  • Forgetting idempotency keys entirely — interviewer pushes 'what if the merchant's request times out and retries' within 10 minutes
  • Storing card numbers in the main application database — breaks PCI scope and triggers a regulatory disaster
  • Treating webhooks as fire-and-forget — they need retries, dead-letter queues, and per-merchant rate limiting
  • Skipping the ambiguous-network-response case — assuming every card network call either clearly succeeds or clearly fails
  • Skipping reconciliation against the card network's settlement report — the only way to catch missed transactions

Likely follow-up questions

  • How would you handle a card-network outage where authorizations are timing out at 50% rate?
  • What changes if you support multiple currencies and same-day FX conversion?
  • How would you implement subscription billing with automatic retry on failed renewal?
  • How would you detect and prevent card-testing fraud (attackers running stolen cards through the platform)?
  • How would you support split payments where one charge is distributed across multiple merchants?

Practice Design an Online Payments System live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

How long is a Design Payments system-design interview?
60 minutes is typical at payments-focused companies (PayPal, Block, Adyen). The expectation is full coverage of idempotency + card vault + webhook delivery + settlement. Source: Glassdoor PayPal/Block/Adyen 2022-2024 reports.
Do I need to know specific card-network protocols?
No — naming 'card network connector' as an external boundary is enough. Knowing ISO 8583 or specific authorization codes is bonus signal but not required unless this is a payments-platform-team interview.
What's the single most-asked follow-up?
Idempotency. Specifically: 'walk me through what happens if the merchant's request times out before they see the response, and they retry with the same idempotency key.' If you can't trace the call path through the engine and explain why the second call returns the original result without a second charge, that's a no-hire at senior+.
Should I discuss PCI compliance in detail?
Mention the card vault as a PCI-scoped service and explain why isolating the card vault narrows audit scope. Drilling into specific PCI DSS requirements is overkill unless this is a security-team interview.