How We Overcame Real‑World Challenges in a Large‑Scale Oracle Database Cutover
This article recounts a seven‑year‑old Oracle 10g database migration, detailing project background, team turmoil, topology redesign, security constraints, data‑sync strategies, custom tools, high‑fidelity testing, unexpected failures, and the lessons learned for reliable operations.
Overview
This sharing starts from a real database cutover case, describing project background, data‑sync solutions, tool development, simulation testing, and the psychological aspects of cutover, revealing behind‑the‑scenes stories of engineering practice.
Project Background
The enterprise support system had been running for over seven years, with a core Oracle 10g database on a small‑frame and a disk array that had never been touched. Rapid business growth, rising load, and aging hardware caused frequent failures, CPU idle rate dropping to 0% and heavy disk I/O, putting huge pressure on the operations team.
Main Difficulties
Difficulty 1: Team Turmoil
The original operations team left en masse, leaving the new team with no knowledge and high psychological and business pressure.
Difficulty 2: Topology Redesign
The original star topology (database + app server at the core, ~100 collection servers) required a shift to a separated internal‑external network.
Difficulty 3: Security Constraints
Upgrading Oracle from 10g to 11g introduced strict password policies (60‑day rotation, lockout after repeated failures), which conflicted with the need for uninterrupted service during migration.
Pre‑Migration Preparations
1. Strengthen Monitoring – Define key business and infrastructure metrics, simulate high‑frequency jobs to expose hidden issues.
2. Train New Staff – Pause migration to focus on onboarding and rebuilding the team.
3. Build a Global View
Redraw system architecture based on independent research.
Identify all stakeholders through extensive interviews.
Data Synchronization Strategy
Implemented with OGG + DBLINK + custom migration scripts.
Oracle GoldenGate
Initially planned to rely entirely on OGG, but limitations appeared during trials:
Massive historical data made real‑time consistency difficult.
Unpartitioned large tables and many unnecessary tables hindered extraction.
EXP/IMP
Example commands (shown as images):
DATABASE LINK
Provides a simple channel between old and new databases.
Custom Migration Program
For massive tables (e.g., 40 million rows per partition), data is sliced into 100 k‑row batches and pushed in parallel, keeping each commit small to allow quick retries and avoid undo tablespace explosion.
Tool Development & Testing
1. Forwarding Component
Daemon on a jump server listens on a port and forwards to the next hop; pseudo‑code illustrated in an image.
2. Cutover Tools
Automation of configuration collection, path monitoring, pre‑/post‑cutover checks, connection switch & rollback, and load testing.
3. High‑Fidelity Simulation
Dual‑database parallel ingestion on all collection servers simulated production concurrency, exposing performance bottlenecks before the actual cutover.
Unexpected Failures
The first cutover succeeded functionally but caused a dramatic drop in key business throughput due to a 100‑fold increase in connection latency. Investigation revealed many hanging processes and table locks, forcing a rollback.
Ghost Process
A forgotten monitoring script triggered Oracle 11g’s “failed‑login delay” policy, locking accounts and rendering the new database unusable.
Post‑Cutover Insights
Thorough configuration checks, simulated execution, and a well‑practiced rollback plan allowed rapid recovery.
Understanding every process in the system eliminated hidden “ghost” processes.
Conclusion
Although the migration was not perfect, several key takeaways emerged:
Knowledge Graph – Documentation alone is insufficient; practitioners need a holistic view of tools and their interactions.
Flexibility – Creative alternatives (network separation, permission negotiations, real‑time data copy) are essential.
Resilience – Persistence, willingness to confront unknowns, and acceptance of failure are critical for success.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.