Design and Implementation of a MySQL Slow Query Log Analysis System on the CloudDB Platform
The article describes the architecture, core functions, and workflow of a MySQL slow‑query analysis system built for the CloudDB platform, covering both daily report generation and real‑time slow‑SQL monitoring using tools such as pt‑query‑digest, ELK, and Kafka.
MySQL slow‑query logs record SQL statements whose execution time exceeds a threshold, and they are crucial for database performance optimization. In the CloudDB platform of the 58 Group, a slow‑SQL system was developed to assist both DBAs and developers in quickly locating performance issues.
The system provides two core features: a daily slow‑query report that aggregates the previous day's slow SQLs, and a real‑time slow‑SQL view that streams live slow queries to developers.
For log analysis the platform adopts the third‑party tool pt‑query‑digest (instead of the built‑in mysqldumpslow ) because it offers richer attributes. pt‑query‑digest supplies two useful functions: review (parameter‑free grouping and statistics) and history (recording metrics such as execution time and lock time for trend analysis).
The daily report workflow includes:
Using CMDB information to identify database clusters and slice the previous day's slow‑log files.
Fetching the sliced logs from each cluster via ansible to a central analysis server.
Running pt‑query‑digest to parse the logs and store results in two MySQL tables tb_slowlog_review and tb_slowlog_review_history .
Displaying the aggregated data on the CloudDB management console.
Calling SQLAdvisor to generate optimization suggestions.
Key modules exposed to users are:
Slow‑SQL quantity statistics and optimization suggestions : trend charts of total slow‑SQL counts per cluster, sortable dimensions such as execution count and average time, and corresponding tuning advice.
Individual slow‑SQL detail tracking : per‑SQL trend, first/last occurrence, source IP, business cluster, owner, execution count, etc., to help quickly pinpoint problematic queries.
Additional features include daily email push of slow‑SQL reports and aggregated statistics.
To overcome the limitation of the daily report (inability to view same‑day queries), a real‑time slow‑SQL subsystem was built using ELK and Kafka. The collection pipeline is:
filebeat watches slow‑log files and streams changes to kafka .
logstash consumes the Kafka stream, filters and parses the logs, and stores them in Elasticsearch.
The CloudDB platform reads from Elasticsearch and presents the live slow‑SQL data to developers.
This enables developers to monitor each slow query as it occurs in production, facilitating immediate performance tuning.
Conclusion : MySQL slow queries are a critical performance factor; the CloudDB platform provides a comprehensive slow‑query analysis solution—including daily reports, real‑time monitoring, quantitative analysis, and actionable recommendations—to support DBAs and developers in optimizing database access.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.