Why TSB’s $3 Billion Data Migration Failed: Testing, Risk, and Lessons Learned
The article examines the disastrous 2018 TSB bank data migration, detailing how rushed system replication, inadequate testing, and complex micro‑service architecture caused massive data loss, regulatory fallout, and costly remediation, while highlighting broader lessons for IT operations and resilience in financial services.
Earlier, the UK’s TSB bank suffered a failed migration that corrupted 1.3 billion customer records, costing roughly ¥29 billion in compensation; the root cause was later identified as a lack of rigorous testing.
In 2018, TSB, still tied to Lloyds Banking Group after a split, inherited a hastily copied IT system and paid £100 million annually for licensing. To break free, TSB launched a massive data migration on 22 April 2018, moving 5.4 million customers to Banco Sabadell’s Proteo4UK platform.
01 Unprecedented Migration, Unprecedented Failure
Banco Sabadell’s chairman announced the plan at a large conference, emphasizing the scale of the Proteo4UK project with over 2,500 person‑years of effort and more than 1,000 technical experts.
After a weekend of system downtime, the new system went live on Sunday night, but within 20 minutes the first complaint arrived: customers saw missing or mis‑recorded transactions, and some accessed accounts that were not theirs.
The FCA and PRA were alerted, and investigations revealed that 1.3 billion records had been corrupted during migration, causing weeks of service disruption for millions.
02 Migration Is Not As Simple As It Looks
Banking IT systems have evolved from manual ledger entries to complex, interconnected platforms involving ATMs, online and mobile banking, and numerous backend services.
Core banking typically runs on mainframes with many auxiliary systems, requiring high availability (four‑nines uptime) and robust disaster‑recovery mechanisms such as active‑active data centers.
Complexity, coupled with frequent changes, makes thorough testing essential; without it, errors propagate across the entire financial network.
03 Post‑mortem Analysis
IBM’s early report cited the combination of advanced micro‑service usage and active‑active data centers as the source of multiple production risks.
Experts stressed that rigorous regression testing, dedicated testing teams, and clear change‑management processes are vital to avoid such failures.
Regulators noted a 187 % increase in UK financial‑service technical incidents from 2017 to 2018, largely due to poor change management.
Ultimately, the TSB incident underscores the high cost of insufficient testing and the need for resilient, well‑tested IT operations in banking.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.