THN Interview Prep

Design a Payment System (Card & Wallet Rails)

1. Requirements

Functional

  • Initiate payments (authorize/capture or one-shot), refunds, voids; support partial captures and multi-currency display with settlement currency rules.
  • Persist immutable financial records for auditing; expose APIs for merchants, finance ops, and read-only auditors.
  • Integrate with PSPs/acquirers behind a stable adapter; consume webhooks for async authorization, capture, dispute, and payout events.
  • Model dispute stages (chargeback opened, evidence due, won/lost) with workflow hooks—exact SLA tables omitted.
  • Never persist full PAN in application databases; accept only PSP payment method tokens or network tokens from client flows.

Non-Functional

  • Scale: global initiation TPS in the thousands to low millions for large processors; correctness dominates marketing QPS claims.
  • Latency: p99 user-visible authorization under ~2–5 s including PSP round-trips; internal ledger commits sub-second per transaction where possible.
  • Availability: 99.99% for payment initiation APIs; graceful degradation (queued processing, clear status) when PSPs partition.
  • Consistency: serializable posting for ledger lines affecting the same accounts; PSP mirror state is eventually consistent and reconciled (consistency).
  • Durability: append-only ledger, WORM audit trail, cross-AZ replication, point-in-time backup for regulatory intervals.

Compliance & PCI DSS Scope (High Level)

  • Minimize CDE: application servers must not store full PAN, CVV, or mag stripe; use hosted fields / Elements so card data posts directly to the PSP from the browser where possible.
  • Tokenization: databases store reversible payment-method tokens issued by PSP or network tokens—scope stays lower than legacy vault-everywhere designs.
  • Network controls: segment networks; keys live in HSM or PSP; TLS 1.2+ everywhere; PAN scrubbing in logs and APM tools.
  • Access: least-privilege RBAC for refund operators; quarterly access reviews—policy-heavy, referenced alongside security topics under ../fundamentals/.

Out of Scope

  • Card network clearing and interchange rulebooks (Visa/Mastercard) beyond gross flows.
  • Full KYC/AML and sanctions screening programs—only integration seams noted.
  • Cryptocurrency settlement and on-chain finality.

2. Back-of-Envelope Estimations

Assume 50k peak TPS on authorization attempts (aggregator scale—not issuer core).

  • Writes: Each successful flow creates intent, auth, optional capture, fee, and ledger lines—plan 3–10 durable rows per payment → 150k–500k row writes/s at peak when sharded by merchant or payment id.
  • Storage: ~2 KB average per ledger line including indexes → ~2 TB/month at 1B lines/month before compression; 7-year retention plus legal hold pushes cold tiers into petabyte totals.
  • Idempotency: Redis/cluster holding keyHash → response with 64–128 byte values and 24–72 h TTLmillions of live keys; memory single-digit GB per region at peak unless compressed responses stored by reference only.
  • Webhooks: Inbound events often 5–20× user-initiated actions (retries, settlement ticks)—internal Kafka/SQS must absorb hundreds of k/s bursts with backpressure (message queues).
  • Reconciliation jobs: Daily files GB-scale; batch CPU partitioned by date × currency to finish within overnight windows.

Show intermediate arithmetic when presenting estimates—align with back-of-envelope habits.

Chargeback reserve: Model ~0.2–1.0% of GMV subject to hold accounts on the ledger—pending balance rows prevent payout of disputed funds without blocking available fee recognition rules your finance team defines.

Settlement calendar: Cross-border settlements may lag T+2 or worse—size suspense accounts for in-flight FX conversions so internal dashboards do not confuse authorized card captures with cash-in-bank availability.

3. API Design

POST /v1/payments
Headers: Idempotency-Key: <uuid>
Body: { merchantId, amountMicros, currency, customerId, paymentMethodToken, metadata }
-> 201 { paymentId, status, pspClientSecret? }

POST /v1/payments/{paymentId}/capture
Headers: Idempotency-Key: <uuid>
Body: { amountMicros }
-> 200 { paymentId, status: "captured" }

POST /v1/refunds
Headers: Idempotency-Key: <uuid>
Body: { paymentId, amountMicros, reason }
-> 201 { refundId, status }

GET /v1/payments/{paymentId}
-> 200 { paymentId, status, amountMicros, currency, ledgerSummary }

GET /v1/merchants/{merchantId}/balance
-> 200 { availableMicros, pendingMicros, currency, asOfVersion }

Errors: 402 card declined, 409 idempotent replay (same key, same body) returns original response; different body under same key → 409 conflict; 424 dependency failure talking to PSP; 429 merchant velocity limit (rate limiter).

4. Data Model

  • PaymentIntent: paymentId (ULID/UUIDv7), merchantId, amountMicros, currency, status, idempotencyKeyHash, pspIntentId, customerId, timestamps.
  • LedgerEntry (double-entry): entryId, debitAccountId, creditAccountId, amountMicros, currency, paymentId, type (auth_hold, capture, refund, fee), postedAtimmutable; corrections are new reversing lines.
  • Account: accountId, ownerType (merchant_settlement, platform_fees, clearing_suspense), currency.
  • BalanceSnapshot (materialized): accountId, availableMicros, pendingMicros, version, updatedAtderived from ledger under defined rules; not a second source of truth.
  • WebhookCursor: pspName, lastEventId, lastProcessedAt for ordered processing.
  • OutboxRow: id, payload, createdAt, sentAt for reliable PSP calls.

Ledger vs balance: The ledger is the authoritative journal for audits and three-way reconciliation with PSP files and bank statements. Balances exist for fast reads and velocity limits; invariants are enforced in the same transaction as new lines or healed by scheduled reconciliation jobs that sum ledger vs snapshot per account (replication patterns apply to read replicas with caution for money reads).

SQL (Postgres or CockroachDB) with strong isolation for posting groups; Redis for idempotency and optional distributed locks; indexes on (merchantId, createdAt), (paymentId), (accountId, postedAt).

5. High-Level Architecture

Loading diagram…

API gateway terminates TLS and applies authN/Z. Payment API validates requests, checks idempotency, orchestrates ledger + outbox in one logical unit of work. PSP worker drains outbox with retries. Webhook ingress dedupes by pspEventId and drives state transitions. Reconciliation compares PSP settlement files to ledger totals; mismatches open tickets and may block payouts. Deep ties to load balancing for stateless API tiers.

6. Component Deep-Dives

  • Idempotency keys: Hash (merchantId, Idempotency-Key); store canonical request hash with response envelope; TTL aligns with card-network retry windows. Same key + body → replay; same key + different body → reject—see idempotency.
  • Posting transaction: Open txn → validate limits using balance snapshot → insert double-entry lines → bump snapshot version → enqueue outbox row → commit. On duplicate submission, unique constraint on idempotency prevents double spend.
  • Ledger vs balance at runtime: Hot merchants cannot afford summing billions of lines—keep rolling baseline + delta window, or partition balances per shard with checksum jobs nightly.
  • PSP unknown states: On timeout, mark processing; safe retries use stored idempotency at PSP layer; reconcile job polls intent until terminal—never guess success.
  • Reconciliation: Import settlement CSV; aggregate ledger by merchant/day/currency; compare net vs bank incoming transfers; breaks investigated before payout file generation. Pending vs available buckets map to auth hold accounts releasing on capture or expiry.
  • PCI process alignment: Services touching tokens only—no PAN paths in CI artifacts; secrets in vault; audit who exported what report.

7. Bottlenecks & Mitigations

  • Hot merchant ledger shard: Vertical limit then partition by sub-merchant or account suffix; sequence generation per shard avoids global contention.
  • Webhook spikes: Kafka consumer groups with partition by PSP; idempotent handlers; dead-letter queue for poison shapes.
  • Reconciliation O(n) scans: Incremental checkpoints; materialized daily rollups per merchant to compare files in O(accounts) not raw lines.
  • Replica lag on reads: For balance, serve from primary or read-after-write token after mutation; never read stale balance for payout approval without guard.

8. Tradeoffs

DecisionAlternativeWhy we picked
Double-entry ledgerSingle balance columnAudits, disputes, regulator requests
PSP-hosted tokenizationEncrypt PAN in DBSmaller PCI footprint and key custody
Serializable core postingEventual balanceMonetary correctness under concurrency
Async webhooks + outboxOnly synchronous PSPNetwork realities; no lost side effects
Regional primary writerFull multi-masterAvoids split captures across regions

9. Follow-ups (interviewer drill-downs)

  • How do you prove totals to auditors? Export immutable journal + PSP settlement + bank statement three-way tie-out per period; signed checksum files for partner pulls from the reconciliation pipeline in section 5 above.

  • What if initiation TPS grows 100×? Hash shard Postgres by merchantId; UUIDv7 time-oriented IDs; isolate hot merchants; read replicas only for non-monetary projections unless version-checked.

  • Exactly-once money movement end-to-end? Not achievable across PSP + bank + us—use idempotent APIs, ledger uniqueness, and reconciliation as the proof of correctness (idempotency).

  • Migrate ledger schema safely? Dual-write new columns; parallel sum validators per shard; freeze short maintenance window for constraint changes if unavoidable.

  • Multi-region active-active? Avoid for same payment—prefer primary region per merchant or sticky session; async Cross-region replication with conflict policy reject second capture.

  • Cost controls? Tier cold ledger to object storage with Parquet; batch interchange reports; aggressive TTL on idempotency keys; rate limit sandbox APIs.

  • Network tokens vs PSP tokens? Network tokens survive PSP migration with issuer consent; design adapter interfaces so paymentMethod references are opaque handles in your schema, not vendor raw IDs everywhere.

  • Dispute evidence? Store immutable payload IDs for receipts submitted to card networks; never overwrite ledger linesappend adjustment entries when chargeback outcome arrives.

Last updated on

Spotted something unclear or wrong on this page?

On this page