Case · fintech · 2022–2023
A double-entry ledger crossed 10⁹ rows without downtime.
Shape of the problem
A payments platform had grown its primary ledger to roughly 840 million rows in a single Postgres instance. Nightly reconciliation was now taking nineteen hours, the read replica lag was oscillating between four and forty seconds, and the single-writer posting path was exhibiting p99 latencies above 2.8 s during business hours. The team had a migration plan on paper that nobody on the team believed was executable in under a year.
What we did
We reframed the migration around three independent cutovers rather than one. First, we split the ledger into an append-only journal and a derived balances view, keeping both in the existing database. This alone cut posting latency by roughly 70% and gave us a clean write boundary to reason about. Second, we introduced an idempotency key layer so that every write could be replayed safely — which turned the eventual shard migration into a boring copy job rather than a coordination exercise. Third, we sharded the journal by tenant-prefixed account id, with a routing layer the application team could understand by reading fewer than two hundred lines of code.
Outcome
- Posting p99 latency: 2.8 s → 140 ms.
- Nightly reconciliation: 19 h → 38 min, running against the journal rather than the balances view.
- Zero customer-visible downtime across fourteen months of dual-write operation.
- The client's own engineers ran the final cutover; we observed.