Case · fintech · 2022–2023

A double-entry ledger crossed 10⁹ rows without downtime.

Shape of the problem

A payments platform had grown its primary ledger to roughly 840 million rows in a single Postgres instance. Nightly reconciliation was now taking nineteen hours, the read replica lag was oscillating between four and forty seconds, and the single-writer posting path was exhibiting p99 latencies above 2.8 s during business hours. The team had a migration plan on paper that nobody on the team believed was executable in under a year.

What we did

We reframed the migration around three independent cutovers rather than one. First, we split the ledger into an append-only journal and a derived balances view, keeping both in the existing database. This alone cut posting latency by roughly 70% and gave us a clean write boundary to reason about. Second, we introduced an idempotency key layer so that every write could be replayed safely — which turned the eventual shard migration into a boring copy job rather than a coordination exercise. Third, we sharded the journal by tenant-prefixed account id, with a routing layer the application team could understand by reading fewer than two hundred lines of code.

Outcome

  • Posting p99 latency: 2.8 s → 140 ms.
  • Nightly reconciliation: 19 h → 38 min, running against the journal rather than the balances view.
  • Zero customer-visible downtime across fourteen months of dual-write operation.
  • The client's own engineers ran the final cutover; we observed.

← Back to work