Operations 17 min read

Why TSB’s $3 Billion Data Migration Failed: Testing, Risk, and Lessons Learned

The article examines the disastrous 2018 TSB bank data migration, detailing how rushed system replication, inadequate testing, and complex micro‑service architecture caused massive data loss, regulatory fallout, and costly remediation, while highlighting broader lessons for IT operations and resilience in financial services.

Efficient Ops

Jan 19, 2020

Why TSB’s $3 Billion Data Migration Failed: Testing, Risk, and Lessons Learned

Earlier, the UK’s TSB bank suffered a failed migration that corrupted 1.3 billion customer records, costing roughly ¥29 billion in compensation; the root cause was later identified as a lack of rigorous testing.

In 2018, TSB, still tied to Lloyds Banking Group after a split, inherited a hastily copied IT system and paid £100 million annually for licensing. To break free, TSB launched a massive data migration on 22 April 2018, moving 5.4 million customers to Banco Sabadell’s Proteo4UK platform.

01 Unprecedented Migration, Unprecedented Failure

Banco Sabadell’s chairman announced the plan at a large conference, emphasizing the scale of the Proteo4UK project with over 2,500 person‑years of effort and more than 1,000 technical experts.

After a weekend of system downtime, the new system went live on Sunday night, but within 20 minutes the first complaint arrived: customers saw missing or mis‑recorded transactions, and some accessed accounts that were not theirs.

The FCA and PRA were alerted, and investigations revealed that 1.3 billion records had been corrupted during migration, causing weeks of service disruption for millions.

02 Migration Is Not As Simple As It Looks

Banking IT systems have evolved from manual ledger entries to complex, interconnected platforms involving ATMs, online and mobile banking, and numerous backend services.

Core banking typically runs on mainframes with many auxiliary systems, requiring high availability (four‑nines uptime) and robust disaster‑recovery mechanisms such as active‑active data centers.

Complexity, coupled with frequent changes, makes thorough testing essential; without it, errors propagate across the entire financial network.

03 Post‑mortem Analysis

IBM’s early report cited the combination of advanced micro‑service usage and active‑active data centers as the source of multiple production risks.

Experts stressed that rigorous regression testing, dedicated testing teams, and clear change‑management processes are vital to avoid such failures.

Regulators noted a 187 % increase in UK financial‑service technical incidents from 2017 to 2018, largely due to poor change management.

Ultimately, the TSB incident underscores the high cost of insufficient testing and the need for resilient, well‑tested IT operations in banking.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration system testing banking operational risk IT resilience

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.