Databases 10 min read

How Machine Learning Can Automate MySQL Index Optimization

This article explains how applying machine learning to database operations—specifically AIOps for MySQL—can automate index recommendation by parsing SQL, extracting semantic and statistical features, generating candidate index combinations, and training an XGBoost model to predict optimal indexes, reducing reliance on manual DBA work.

Efficient Ops

Mar 10, 2024

How Machine Learning Can Automate MySQL Index Optimization

1. Background

In the DBA community, it is often said that mastering indexes solves 80% of database problems, highlighting the importance of indexes for relational databases. Traditional DBA work focuses on SQL optimization, which mainly involves index tuning and query rewriting. When query volumes reach billions of slow queries per day, manual tuning becomes impractical.

Rule‑based and cost‑based automatic index recommendation tools have emerged, but they struggle with diverse workloads, varying developer skill levels, and sometimes inaccurate statistics. To address these limitations, the project introduces machine learning into the traditionally manual DBA domain, known as AIOps.

2. Principles and Key Process

SQL optimization traditionally follows two paths: (1) analyzing fields used in queries, conditions, joins, and aggregations to decide which indexes to create; (2) rewriting queries or using hints to guide the optimizer. This project focuses on the first path—index recommendation.

Key points for index recommendation include identifying which fields are used for retrieval, which fields appear in multi‑table joins, and the statistical distribution of tables and columns.

2.1 Syntax Parsing & Semantic Feature Extraction

The project uses the Python sqlparse library to parse SQL statements and extract semantic features such as WHERE, JOIN, aggregation functions, ORDER BY, GROUP BY, and common equality or range predicates.

2.2 Statistical Feature Extraction

Beyond semantic features, the system extracts table and column statistics from the database metadata, such as total row count, column type, null count, distinct value count, and selectivity. For MySQL, statistics are stored in mysql.innodb_index_stats and mysql.innodb_table_stats, with histograms available in information_schema.column_statistics for MySQL 8.0.

2.3 Index Combination Generation

After extracting features, the system generates candidate index combinations, including single‑column, two‑column, and three‑column indexes (excluding >3 columns). Each SQL can use only one index per table, so all relevant fields are combined to produce candidate sets.

2.4 Feature Engineering & Modeling

Features for each candidate index consist of semantic features and statistical features. For multi‑column indexes, features of each column are concatenated in order, preserving column sequence. Positive samples are indexes actually used by a query; all others are negative.

2.5 Model Training and Prediction

The prepared training set undergoes preprocessing (missing value handling, zero‑row removal, type normalization). Separate XGBoost binary classification models are trained for single‑column and multi‑column indexes. Trained models are saved with joblib and loaded online for real‑time index recommendation. Feedback from deployments can be fed back to retrain and improve the model.

3. Future Plans

The current implementation supports only MySQL; future work includes extending to other relational databases such as openGauss.

With the rise of large language models, the project also plans to explore fine‑tuning open‑source LLMs for index recommendation to further enhance user experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning feature engineering SQL mysql Index Optimization aiops

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.