Why MySQL count() Slows Down on Large Tables and How to Speed It Up
This article explains how MySQL's count() works across storage engines, why InnoDB scans rows and can time out on massive tables, and presents practical alternatives such as using EXPLAIN rows, a dedicated count table, batch processing, or binlog‑to‑Hive for efficient row‑count estimation.
Creating the SMS Table
Assume you have an sms table that stores messages to be sent, with a
statecolumn where
0means not yet sent. An asynchronous thread repeatedly picks rows with
state=0, sends the SMS, and updates
stateto
1, gradually reducing the number of unsent messages.
For monitoring you may need to know how many rows still have
state=0. The usual way is to run a
COUNT()query.
<code>select count(*) from sms where state = 0;</code>When the table is small this works fine, but with millions of unsent rows the query can become very slow and eventually time out.
How count() Works
The purpose of
COUNT()is to count the number of non‑NULL rows returned by the query.
MySQL consists of a server layer and a storage‑engine layer. The storage engine (e.g., InnoDB or MyISAM) determines how the row count is obtained.
The table creation SQL specifies the engine, e.g.,
ENGINE=InnoDB, which selects the InnoDB engine.
Although the method is called
COUNT()at the server level, its implementation differs between engines.
With MyISAM, the engine maintains a hidden row‑count field, so
SELECT COUNT(*) FROM tblsimply reads that field and is extremely fast.
InnoDB does not store a row‑count field. Instead, it chooses the smallest index tree, traverses its leaf nodes, and adds up the rows, which requires scanning many pages and can be slow for large tables.
Why InnoDB Cannot Store a Simple Row‑Count Field
MyISAM does not support transactions, while InnoDB does. InnoDB uses MVCC to provide the default REPEATABLE READ isolation level. Because different transactions may see different row counts, InnoDB cannot keep a single static counter without breaking isolation guarantees.
Performance of Different count() Variants
The argument inside
COUNT()slightly changes the work performed:
If the argument is
*, the server receives rows from InnoDB and simply increments the counter, assuming no NULL values.
If the argument is
1, the server adds a constant
1for each row, also without NULL checks.
If the argument is a specific column, InnoDB must return that column’s value, and the server must check for NULL before counting. The cost depends on whether the column is a primary key, an indexed column, or an unindexed column.
Overall performance ranking is:
<code>count(*) ≈ count(1) > count(primary_key) > count(indexed_column) > count(unindexed_column)</code>Thus
COUNT(*)is already the fastest, but it still requires scanning rows in InnoDB.
When Approximate Row Counts Are Sufficient
If you only need to know the order of magnitude of unsent messages, you can use the
rowsestimate from
EXPLAIN. The
rowscolumn gives a sampled estimate of how many rows the optimizer expects to read, which is good enough for most monitoring scenarios.
When Exact Row Counts Are Required
For precise counts you can maintain a separate
count_tablethat stores pre‑computed counts for various scenarios.
<code>CREATE TABLE `count_table` (
`id` int NOT NULL AUTO_INCREMENT COMMENT 'primary key',
`cnt_what` char(20) NOT NULL DEFAULT '' COMMENT 'metric name',
`cnt` tinyint NOT NULL COMMENT 'count value',
PRIMARY KEY (`id`),
KEY `idx_cnt_what` (`cnt_what`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;</code>When a count is needed, read it directly:
<code>select cnt from count_table where cnt_what = "未发送的短信数量";</code>To keep the count up‑to‑date you have two options:
High‑Frequency Updates
Include the
UPDATE count_tablestatement inside the same transaction that inserts or deletes rows. This ensures the count respects transaction isolation and rolls back correctly, but it can cause contention under heavy write loads.
Low‑Frequency Updates
If real‑time accuracy is not required, you can run a periodic batch job that scans the table (e.g., 10 k rows at a time), records the maximum
id, and updates the count table once per hour or day.
For even larger scales, you can stream binlog changes to Hive and run the count query there, leveraging existing data‑pipeline components.
Summary
MySQL
COUNT()reads full‑table data; MyISAM stores a hidden row‑count field, while InnoDB must scan rows.
Performance order:
count(*) ≈ count(1) > count(primary key) > count(indexed column) > count(unindexed column)</li><li>For approximate counts, use <code>EXPLAINand its
rowsestimate.
For exact counts, maintain a dedicated count table and update it either within transactions (high‑frequency) or via periodic batch jobs (low‑frequency).
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.