Databases 9 min read

Effectiveness of Adding an Index on a Status Column in a Tens‑of‑Millions Row MySQL Table

This article explains how adding an index to a status column in a tens‑of‑millions‑row MySQL table affects query performance, covering the basic index lookup process, extreme cases where full scans are chosen, selectivity, covering indexes, composite indexes, partitioning, and using EXPLAIN to verify execution plans.

IT Services Circle
IT Services Circle
IT Services Circle
Effectiveness of Adding an Index on a Status Column in a Tens‑of‑Millions Row MySQL Table

In a recent interview question, a candidate was asked whether adding an index to the status field of a table containing tens of millions of rows would be effective. The discussion explores this from several angles.

1. Simple execution flow with an index – Consider a table order_info with a column order_status and a regular B+‑tree index idx_order_status . A query such as SELECT * FROM order_info WHERE order_status = 'FINISHED' follows these steps:

Start at the root of the B+ tree and perform a binary search down to the leaf node that contains the value FINISHED .

Collect the primary‑key IDs stored in the leaf nodes that match FINISHED .

For each ID, the engine returns to the primary‑key index (the clustered index) to fetch the full row – a process known as “back‑table lookup” or “回表”.

2. Extreme scenarios – index vs. full table scan – If every row in the table has order_status = 'FINISHED' , the optimizer recognises that scanning the entire table is cheaper than traversing the index and performing many back‑table lookups, so it chooses a full table scan. Even when only a few rows differ, the cost of index navigation plus back‑table I/O may still be higher than a sequential scan, leading the optimizer to ignore the index.

3. Selectivity (Cardinality) of order_status – The effectiveness of an index depends on how many distinct values the column holds. High cardinality (many different values, low duplication) makes the index selective and beneficial. Low cardinality (e.g., only three possible statuses) reduces selectivity; the optimizer may prefer a full scan. You can inspect the index cardinality with SHOW INDEX FROM order_info WHERE Key_name = 'idx_order_status' .

4. Impact of result‑set size – When a query returns a large proportion of the table (e.g., all rows with status “FINISHED”), the engine may still opt for a full scan because the back‑table lookups become expensive. A common optimisation is to use a covering index that includes all columns needed by the query, such as a composite index on order_status and order_id , allowing the engine to satisfy the query directly from the index without back‑table access.

5. Combining other fields – Real‑world queries often filter by additional criteria. Adding composite indexes (e.g., idx_client_no_order_status for customer‑specific queries or idx_create_time_order_status for recent‑order queries) dramatically improves performance.

6. Partitioned tables – For tables with tens of millions of rows, partitioning by a high‑selectivity column such as order_status or another field can limit the amount of data scanned, as only relevant partitions are accessed.

7. Using EXPLAIN to verify the plan – After writing the query, run EXPLAIN SELECT * FROM order_info WHERE order_status = 'FINISHED' to see whether the optimizer uses the index (type = ref or range ) or falls back to a full scan (type = ALL ).

SQLDatabaseQuery OptimizationMySQLindexEXPLAINpartitioning
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.