Databases 14 min read

Why MySQL count() Slows Down on Large Tables and How to Speed It Up

This article explains how MySQL's count() works across storage engines, why InnoDB scans rows and can time out on massive tables, and presents practical alternatives such as using EXPLAIN rows, a dedicated count table, batch processing, or binlog‑to‑Hive for efficient row‑count estimation.

macrozheng

Apr 11, 2025

Why MySQL count() Slows Down on Large Tables and How to Speed It Up

Creating the SMS Table

Assume you have an sms table that stores messages to be sent, with a state column where 0 means not yet sent. An asynchronous thread repeatedly picks rows with state=0, sends the SMS, and updates state to 1, gradually reducing the number of unsent messages.

For monitoring you may need to know how many rows still have state=0. The usual way is to run a COUNT() query. select count(*) from sms where state = 0; When the table is small this works fine, but with millions of unsent rows the query can become very slow and eventually time out.

How count() Works

The purpose of COUNT() is to count the number of non‑NULL rows returned by the query.

MySQL consists of a server layer and a storage‑engine layer. The storage engine (e.g., InnoDB or MyISAM) determines how the row count is obtained.

The table creation SQL specifies the engine, e.g., ENGINE=InnoDB, which selects the InnoDB engine.

Although the method is called COUNT() at the server level, its implementation differs between engines.

With MyISAM, the engine maintains a hidden row‑count field, so SELECT COUNT(*) FROM tbl simply reads that field and is extremely fast.

InnoDB does not store a row‑count field. Instead, it chooses the smallest index tree, traverses its leaf nodes, and adds up the rows, which requires scanning many pages and can be slow for large tables.

Why InnoDB Cannot Store a Simple Row‑Count Field

MyISAM does not support transactions, while InnoDB does. InnoDB uses MVCC to provide the default REPEATABLE READ isolation level. Because different transactions may see different row counts, InnoDB cannot keep a single static counter without breaking isolation guarantees.

Performance of Different count() Variants

The argument inside COUNT() slightly changes the work performed:

If the argument is *, the server receives rows from InnoDB and simply increments the counter, assuming no NULL values.

If the argument is 1, the server adds a constant 1 for each row, also without NULL checks.

If the argument is a specific column, InnoDB must return that column’s value, and the server must check for NULL before counting. The cost depends on whether the column is a primary key, an indexed column, or an unindexed column.

Overall performance ranking is:

count(*) ≈ count(1) > count(primary_key) > count(indexed_column) > count(unindexed_column)

Thus COUNT(*) is already the fastest, but it still requires scanning rows in InnoDB.

When Approximate Row Counts Are Sufficient

If you only need to know the order of magnitude of unsent messages, you can use the rows estimate from EXPLAIN. The rows column gives a sampled estimate of how many rows the optimizer expects to read, which is good enough for most monitoring scenarios.

When Exact Row Counts Are Required

For precise counts you can maintain a separate count_table that stores pre‑computed counts for various scenarios.

CREATE TABLE `count_table` (
  `id` int NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `cnt_what` char(20) NOT NULL DEFAULT '' COMMENT 'metric name',
  `cnt` tinyint NOT NULL COMMENT 'count value',
  PRIMARY KEY (`id`),
  KEY `idx_cnt_what` (`cnt_what`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

When a count is needed, read it directly:

select cnt from count_table where cnt_what = "未发送的短信数量";

To keep the count up‑to‑date you have two options:

High‑Frequency Updates

Include the UPDATE count_table statement inside the same transaction that inserts or deletes rows. This ensures the count respects transaction isolation and rolls back correctly, but it can cause contention under heavy write loads.

Low‑Frequency Updates

If real‑time accuracy is not required, you can run a periodic batch job that scans the table (e.g., 10 k rows at a time), records the maximum id, and updates the count table once per hour or day.

For even larger scales, you can stream binlog changes to Hive and run the count query there, leveraging existing data‑pipeline components.

Summary

MySQL COUNT() reads full‑table data; MyISAM stores a hidden row‑count field, while InnoDB must scan rows.

Performance order:

count(*) ≈ count(1) > count(primary key) > count(indexed column) > count(unindexed column)</li><li>For approximate counts, use <code>EXPLAIN

and its rows estimate.

For exact counts, maintain a dedicated count table and update it either within transactions (high‑frequency) or via periodic batch jobs (low‑frequency).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance SQL InnoDB MySQL MyISAM COUNT

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.