Databases 34 min read

Meituan Database Attack‑Defense Practice: Kernel Observability, Full SQL, and Index Optimization

The article details how Meituan built a MySQL autonomous platform by constructing kernel observability to split OnCPU/OffCPU wait time, capturing full SQL directly from the kernel with compression, designing a safe exception‑handling workflow, and generating cost‑based index‑tuning suggestions—including what‑if analysis and workload‑driven recommendations—to enable comprehensive SQL governance.

Meituan Technology Team

Jul 6, 2023

Meituan Database Attack‑Defense Practice: Kernel Observability, Full SQL, and Index Optimization

0 Review of the Previous Part

The first part covered database anomaly detection and diagnosis, emphasizing cases where root causes were difficult to pinpoint. This second part explains how kernel observability, full‑SQL capture, exception handling, and index‑tuning address those challenges.

1 Kernel Observability Construction

1.1 Challenges

SQL performance jitter appears without obvious causes: execution time spikes despite good plans, low row counts, and no lock contention.

Metric‑anomaly diagnosis relies on snapshot data (e.g., information_schema.processlist or slow‑query logs), which provides only a partial view of the execution.

1.2 Wait‑Time Quantification

Inspired by Brendangregg’s TSA method (https://www.brendangregg.com/tsamethod.html), total SQL latency is split into On‑CPU and Off‑CPU components. If most latency is On‑CPU, the problem lies in the query itself (e.g., full scans, poor optimizer cost). If Off‑CPU dominates, external factors such as locks or resource contention are suspected.

On‑CPU time is measured with getrusage before and after query execution to obtain ru_utime and ru_stime. Off‑CPU analysis uses lightweight kernel‑level wait metrics collected via custom instrumentation that mirrors performance_schema but provides per‑SQL granularity with negligible overhead.

The metric hierarchy separates Statement level indicators (e.g., QUERY_TIME, ROWS_EXAMINED) from Wait level indicators (e.g., mutex, latch, IO waits). The hierarchy is illustrated in the diagram:

2 Full‑SQL Capture

Full‑SQL means collecting every SQL statement issued by applications together with detailed execution metrics. The earlier TCP‑packet‑parsing approach missed most metrics, making diagnosis difficult.

The new solution extracts SQL text and >100 kernel‑level metrics directly from the thd structure, pushes them into a lock‑free queue, and writes them to a full‑SQL file via a dedicated output thread.

Because daily data can reach petabyte scale, the pipeline uses an rds‑agent to read the file and compresses it with Snappy. By sorting SQL texts by their first N characters (default N=50) before compression, the compression ratio improves from ~2× to 7–8×.

3 Exception Handling

After root‑cause analysis, remediation actions are classified as “loss‑less” (e.g., disk‑space cleanup, parameter tuning, missing‑index addition) or “lossy” (e.g., killing sessions, throttling traffic). Lossy actions are wrapped in approval workflows before execution.

For MySQL Hang scenarios, a high‑availability component periodically probes the primary; if a hang is detected, an automatic primary‑secondary switchover or instance replacement is triggered.

4 Index Tuning and Governance

4.1 Single‑SQL Cost‑Based Suggestions

The optimizer chooses the plan with the lowest estimated cost. For a query SELECT * FROM test_db.table1 WHERE c2=3 AND c3=4 AND c4<'3', the optimizer selected the plan using idx_c2 because its cost was lower than alternatives.

Cost formulas (derived from Percona Server source) are:

Table‑scan:

IO_COST = pages * IO_BLOCK_READ_COST + rows * ROW_EVALUATE_COST

Index‑scan:

IO_COST = (records/keys_per_block) * IO_BLOCK_READ_COST + rows * ROW_EVALUATE_COST

Range‑access:

IO_COST = records_in_range * IO_BLOCK_READ_COST + 2*records_in_range * ROW_EVALUATE_COST

Ref:

IO_COST = prefix_rowcount * IO_BLOCK_READ_COST + prefix_rowcount * cur_fanout * ROW_EVALUATE_COST

To evaluate hypothetical indexes without creating them in production, the “what‑if” strategy from Microsoft’s AutoAdmin utility (https://dl.acm.org/doi/10.1145/276304.276337) is adopted. Empty metadata‑only indexes are created on a non‑production instance, and storage‑engine functions ( scan_time(), records_in_range(), info()) are modified to return realistic statistics (e.g., sampled row counts, innodb_rec_per_key). The optimizer then computes costs.

4.2 Workload‑Driven Index Recommendations

When storage space is limited (e.g., 100 GB of index space), the platform selects a subset of indexes that maximizes overall performance. The pipeline consists of six components:

COLUMN GROUP RESTRICTION : Computes a CG‑Cost(g) for each column group g using optimizer‑estimated costs; groups with CG‑Cost ≥ threshold f are retained.

CANDIDATE INDEX SELECTION : For each query, the optimizer’s best index set becomes a candidate; candidates are merged across queries.

INDEX MERGING : Overlapping indexes are combined to reduce storage while preserving performance.

CONFIGURATION ENUMERATION : A hybrid brute‑force/greedy algorithm selects up to k indexes that minimize total cost under the size constraint.

MULTI‑COLUMN INDEX GENERATION : Iteratively expands single‑column candidates to double‑column and higher‑order indexes using strategies MC_LEAD, MC_ALL, MC_BASIC (MC_LEAD shown to be most effective).

FINAL INDEXES : The process stops when both cost and storage limits are satisfied, outputting the optimal index set.

4.3 SQL Governance

The platform provides three governance stages:

Pre‑deployment audit : CI/CD integrates a risk‑SQL check that blocks releases and suggests index improvements.

Real‑time detection : Rules (slow‑query count, execution time, row scans) and data‑model‑based anomaly detection flag problematic queries during execution, offering immediate remediation actions.

Post‑execution batch remediation : Historical SQL logs are analyzed with workload‑based recommendations; approved indexes are added automatically (e.g., via the open‑source Ghost tool).

5 References

Brendangregg TSA method: https://www.brendangregg.com/tsamethod.html

Percona Server source (release‑5.7.41‑44): https://github.com/percona/percona-server/tree/release-5.7.41-44

Microsoft AutoAdmin “what‑if” index analysis utility: https://dl.acm.org/doi/10.1145/276304.276337

Random Sampling for Histogram Construction (Microsoft research): https://www.microsoft.com/en-us/research/publication/random-sampling-for-histogram-construction-how-much-is-enough/

Additional related works on self‑driving databases and cost‑driven index selection are listed in the original article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability Exception Handling mysql Performance Schema Index Tuning Full‑SQL Workload Optimization

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.