Databases 14 min read

Why MySQL count() Slows Down on Large Tables and How to Speed It Up

This article explains how MySQL's count() works across storage engines, why InnoDB scans rows and can time out on massive tables, and presents practical alternatives such as using EXPLAIN rows, a dedicated count table, batch processing, or binlog‑to‑Hive for efficient row‑count estimation.

macrozheng
macrozheng
macrozheng
Why MySQL count() Slows Down on Large Tables and How to Speed It Up

Creating the SMS Table

Assume you have an sms table that stores messages to be sent, with a

state

column where

0

means not yet sent. An asynchronous thread repeatedly picks rows with

state=0

, sends the SMS, and updates

state

to

1

, gradually reducing the number of unsent messages.

For monitoring you may need to know how many rows still have

state=0

. The usual way is to run a

COUNT()

query.

<code>select count(*) from sms where state = 0;</code>

When the table is small this works fine, but with millions of unsent rows the query can become very slow and eventually time out.

How count() Works

The purpose of

COUNT()

is to count the number of non‑NULL rows returned by the query.

MySQL consists of a server layer and a storage‑engine layer. The storage engine (e.g., InnoDB or MyISAM) determines how the row count is obtained.

Mysql Architecture
Mysql Architecture

The table creation SQL specifies the engine, e.g.,

ENGINE=InnoDB

, which selects the InnoDB engine.

Although the method is called

COUNT()

at the server level, its implementation differs between engines.

With MyISAM, the engine maintains a hidden row‑count field, so

SELECT COUNT(*) FROM tbl

simply reads that field and is extremely fast.

InnoDB does not store a row‑count field. Instead, it chooses the smallest index tree, traverses its leaf nodes, and adds up the rows, which requires scanning many pages and can be slow for large tables.

Why InnoDB Cannot Store a Simple Row‑Count Field

MyISAM does not support transactions, while InnoDB does. InnoDB uses MVCC to provide the default REPEATABLE READ isolation level. Because different transactions may see different row counts, InnoDB cannot keep a single static counter without breaking isolation guarantees.

Performance of Different count() Variants

The argument inside

COUNT()

slightly changes the work performed:

If the argument is

*

, the server receives rows from InnoDB and simply increments the counter, assuming no NULL values.

If the argument is

1

, the server adds a constant

1

for each row, also without NULL checks.

If the argument is a specific column, InnoDB must return that column’s value, and the server must check for NULL before counting. The cost depends on whether the column is a primary key, an indexed column, or an unindexed column.

Overall performance ranking is:

<code>count(*) ≈ count(1) > count(primary_key) > count(indexed_column) > count(unindexed_column)</code>

Thus

COUNT(*)

is already the fastest, but it still requires scanning rows in InnoDB.

When Approximate Row Counts Are Sufficient

If you only need to know the order of magnitude of unsent messages, you can use the

rows

estimate from

EXPLAIN

. The

rows

column gives a sampled estimate of how many rows the optimizer expects to read, which is good enough for most monitoring scenarios.

EXPLAIN rows
EXPLAIN rows

When Exact Row Counts Are Required

For precise counts you can maintain a separate

count_table

that stores pre‑computed counts for various scenarios.

<code>CREATE TABLE `count_table` (
  `id` int NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `cnt_what` char(20) NOT NULL DEFAULT '' COMMENT 'metric name',
  `cnt` tinyint NOT NULL COMMENT 'count value',
  PRIMARY KEY (`id`),
  KEY `idx_cnt_what` (`cnt_what`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;</code>

When a count is needed, read it directly:

<code>select cnt from count_table where cnt_what = "未发送的短信数量";</code>

To keep the count up‑to‑date you have two options:

High‑Frequency Updates

Include the

UPDATE count_table

statement inside the same transaction that inserts or deletes rows. This ensures the count respects transaction isolation and rolls back correctly, but it can cause contention under heavy write loads.

Update count in transaction
Update count in transaction

Low‑Frequency Updates

If real‑time accuracy is not required, you can run a periodic batch job that scans the table (e.g., 10 k rows at a time), records the maximum

id

, and updates the count table once per hour or day.

Batch processing
Batch processing

For even larger scales, you can stream binlog changes to Hive and run the count query there, leveraging existing data‑pipeline components.

MySQL to Hive sync
MySQL to Hive sync

Summary

MySQL

COUNT()

reads full‑table data; MyISAM stores a hidden row‑count field, while InnoDB must scan rows.

Performance order:

count(*) ≈ count(1) > count(primary key) > count(indexed column) > count(unindexed column)</li><li>For approximate counts, use <code>EXPLAIN

and its

rows

estimate.

For exact counts, maintain a dedicated count table and update it either within transactions (high‑frequency) or via periodic batch jobs (low‑frequency).

monitoringperformanceSQLInnoDBMySQLMyISAMcount()
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.