Why MySQL Optimizer Chooses Full Table Scan Over Index Scan: Cost Analysis and Experiment
This article explains how MySQL's cost‑based optimizer decides between using an index and performing a full table scan, demonstrates the behavior with a large test table, shows optimizer trace details, and clarifies why index access can sometimes be more expensive than scanning the whole table.
1. Basic Conclusion
SQL execution cost is a key factor for the MySQL optimizer when selecting an execution plan; if the optimizer estimates that using an index costs more than a full table scan, it will choose the latter.
2. Problem Phenomenon
A table test03 with about 1.04 million rows is created:
CREATE TABLE `test03` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
`dept` tinyint(4) NOT NULL COMMENT '部门id',
`name` varchar(30) COLLATE utf8mb4_bin DEFAULT NULL COMMENT '用户名称',
`create_time` datetime NOT NULL COMMENT '注册时间',
`last_login_time` datetime DEFAULT NULL COMMENT '最后登录时间',
PRIMARY KEY (`id`),
KEY `ct_index` (`create_time`)
) ENGINE=InnoDB AUTO_INCREMENT=1048577 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='测试表';Two queries are run:
Query 1 does not use the ct_index (type=ALL, rows≈1,045,955).
Query 2 uses the ct_index (type=range, rows≈169).
# 查询1
explain select * from test03 where create_time > '2021-10-01 02:04:36';
# 查询2
explain select * from test03 where create_time < '2021-01-01 02:04:36';3. Obtaining Optimizer Trace Information
The optimizer_trace tool is enabled and the trace is retrieved after executing Query 1:
# Increase trace buffer size
set global optimizer_trace_max_mem_size = 1048576;
# Enable trace
set optimizer_trace="enabled=on";
# Execute SQL
select * from test03 where create_time > '2021-10-01 02:04:36';
# Show trace
select TRACE from INFORMATION_SCHEMA.OPTIMIZER_TRACE\GThe trace JSON shows a rows_estimation step where the cost of using the index versus a full scan is compared, and the optimizer chooses the full scan because the estimated cost is lower.
4. Why Index Cost Can Be Higher Than Full Scan
The index ct_index(create_time) does not cover all selected columns, so MySQL must perform a “row lookup” (using the primary key) after scanning the index, which adds overhead. When the proportion of rows that satisfy the condition is large (≈60% for Query 1), the extra random I/O of the lookup outweighs the benefit of the index, leading the optimizer to prefer a sequential full‑table scan.
Conversely, Query 2 returns only about 0.02% of rows, so the index lookup is cheap and the optimizer selects the index.
select (select count(*) from test03 where create_time > '2021-10-01 02:04:36')/ (select count(*) from test03) as '>20211001',
(select count(*) from test03 where create_time < '2021-01-01 02:04:36')/ (select count(*) from test03) as '<20210101';
-- Result: >20211001 = 0.5997, <20210101 = 0.0002Official MySQL documentation confirms that the optimizer considers many factors—table size, row count, I/O block size, and estimated cost—rather than a fixed data‑percentage threshold when deciding between index scan and full scan.
References: https://dev.mysql.com/doc/refman/5.7/en/where-optimization.html https://opensource.actionsky.com/20201127-mysql/ https://blog.csdn.net/CSDNcircular/article/details/107253747
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.