Case · healthtech · 2021–2022
A zero-downtime schema migration across fourteen months of dual-writes.
Shape of the problem
A clinical-records platform needed to migrate its core encounter model from a deeply nested document schema to a normalized relational representation. Regulatory constraints ruled out any read downtime longer than thirty seconds. The existing data was roughly 2.4 TB with nontrivial referential integrity dependencies across eleven downstream services.
What we did
We designed the migration as three layered systems: a dual-write proxy that could be toggled per-tenant; a correctness harness that compared the old and new reads continuously and logged any divergence with enough context to debug; and a rollback path that remained viable through the end of the migration, not just the beginning. The cutover itself took nine seconds for the largest tenant, with every downstream service already reading from the new schema for weeks beforehand.
Outcome
- Total read downtime across all tenants: under ten seconds, aggregated.
- Divergences caught by the correctness harness before cutover: 412, all fixed.
- Divergences discovered after cutover: 2, both benign.
- The client's engineers ran the cutover unassisted; we watched from a shared channel.