How We Monitored and Optimized Databases During a New‑Old System Switch (Part 1)
During a high‑traffic migration where QPS peaked over 10,000, the team used DBDoctor to perform full‑stack database monitoring, pinpoint long‑running transactions and slow SQL, apply index recommendations, and achieve cost reductions of up to 246 000 times, demonstrating rapid, data‑driven performance optimization.
Background
During a May Day deployment the team performed the first cut‑over from a legacy system to a new system. The migration required real‑time bidirectional data synchronization, compatibility between differing schemas, and a QPS peak exceeding 10,000, stressing database stability. DBDoctor was introduced for full‑link performance monitoring and diagnostic optimization.
DBDoctor Overview
DBDoctor is a kernel‑level database performance diagnostic tool that performs fine‑grained scans, enabling minute‑level bottleneck location and hundred‑fold diagnostic efficiency improvement. Its core capabilities are:
Mathematical quantification model : quantifies per‑SQL resource consumption, replacing experience‑driven thresholds.
Visual resource mapping : maps consumption to colored areas for rapid pinpointing.
Proactive performance alerts : uses algorithms to discover potential risks and suggest optimizations.
Deployment
Mac installation
Disable Gatekeeper: sudo spctl --master-disable Trust the installer in System Settings → Privacy & Security , then double‑click the .pkg file. After installation a static HTML page shows version, install path, access URL and uninstall instructions (e.g., file:///private/tmp/dbdoctor_completion.html). Log in with an email address to enter the system.
Practical use during migration
Instance integration and diagnosis
After login, the Instance List page provides pre‑configured database instances. Selecting Instance Diagnosis opens detailed diagnostics for the chosen instance.
Performance insights
The Performance Insights module offered real‑time monitoring with:
Time range : supports granular views such as the last hour with auto‑refresh.
Average Active Session (AAS) : breaks load down by CPU, DDL, IO, LOCK, MEM, NET, Other.
Max vCPU waterline : shows stage‑wise time distribution (executing, waiting for handler commit, opening tables, etc.).
Bidirectional sync doubled read/write volume, causing QPS to exceed 10,000 and creating significant pressure on the database.
SQL correlation analysis displayed each SQL statement and its AAS share, allowing quick identification of high‑consumption queries. Business request distribution grouped load by database account or host IP to separate old‑system from new‑system traffic.
Root‑cause diagnosis
Long transaction : execution time > 98 s, likely due to missing or improper indexes, with risks of table locks, CPU and IO saturation.
Fix recommendations
Long transaction optimization : optimize slow SQL inside the transaction or split the transaction; if the system is already stuck, kill the session urgently.
SQL syntax and performance : detected 2 severe issues, 7 warnings, and 0 dangerous operations (details available in the UI).
Slow‑SQL management
The Slow SQL Trend view combines multi‑dimensional monitoring to show performance fluctuations. Duration distribution is grouped into four buckets: 1–3 s, 3–5 s, 5–10 s, and >10 s, with drill‑down to the specific SQL list.
DBDoctor clusters similar statements and generates a unique SQL fingerprint . Example:
-- Original SQL
select * from table where id=1;
select * from table where id=2;
-- SQL fingerprint
select * from table where id=?;Deep analysis of a slow query provides the real‑time execution plan, index recommendations, and root‑cause details. One diagnosed query lacked an index on the progress table:
ALTER TABLE progress ADD INDEX dbdoctor_idx__project_id(`project_id`) ALGORITHM=INPLACE, LOCK=NONE;After adding the index the cost dropped from 27,068.3 to 1.1, a performance gain of roughly 2.46 million‑fold. Additional findings:
Excessive scanned rows caused by large request ranges or data skew.
Recommendation to use primary‑key or indexed columns in WHERE clauses.
Storage analysis
Disk utilization : no anomalies detected.
TOP5 table space usage : single tables approach 6 GB.
TOP5 database space usage : single databases approach 20 GB.
Index recommendations
The index recommendation module identified 7 tables with optimization potential and generated executable DDL statements together with the underlying recommendation logic.
Outcome
Real‑time monitoring : full‑link performance insight during a bidirectional sync scenario with QPS > 10 k.
Precise diagnosis : minute‑level location of long transactions and slow SQL ensured compatibility during cut‑over.
Efficient optimization : index addition and SQL tuning reduced key query cost by more than 99.99 %.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
