Databases 8 min read

How to Choose the Optimal Prefix Length for MySQL Indexes

This article explains why indexing long character columns can bloat MySQL indexes, introduces the concept of index selectivity, shows how to calculate the best prefix length with real‑world queries, and demonstrates performance differences between short and optimal index prefixes.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
How to Choose the Optimal Prefix Length for MySQL Indexes

1 Background

When indexing a character‑type column, a very long field makes the index large and slow. In InnoDB each page is 16KB, so the more index entries that fit in a page, the shallower the B+Tree and the faster the lookup. Indexing the whole column consumes many pages, increasing tree depth and reducing efficiency. Specifying a prefix length (e.g., indexing only the first few characters) can greatly reduce index size per page and improve performance.

<code># enp_name 如果值普遍太长会导致索引空间爆棚
create index idx_emp_empname on emp(emp_name);

# 可能合适的做法
create index idx_emp_empname on emp(emp_name(5));
</code>

2 About Index Selectivity

Index selectivity is the ratio of distinct index values (cardinality) to the total number of rows, ranging from 1/#T to 1. Higher selectivity yields better query performance because more rows are filtered out. A unique index has selectivity 1, the best possible.

For example, setting the index length of emp_name to 4 characters prevents long strings from occupying excessive space, which would otherwise increase B‑Tree depth and I/O.

Note: #T denotes the total row count. According to "High Performance MySQL" (section 5.3.2), a prefix often provides sufficient selectivity for queries. For BLOB, TEXT, or very long VARCHAR columns, MySQL requires a prefix index because it cannot index the full length.

3 How to Calculate Index Selectivity

3.1 Analysis

Consider an emp table with 5 million rows. We want to index the emp_name column, which contains many long values. First, inspect distinct names:

<code>1  mysql> select distinct emp_name from emp;
2  +----------+
3  | emp_name |
4  +----------+
5  |  ali_user |
6  |  brandy_user |
7  |  cancys_user |
8  |  davide_user |
9  |  elasne_user |
10 |  finest_user |
11 |  ......... |
12 |  gagnannan_user |
13 |  halande_user |
14 |  ivil_user |
15 |  kikol_user |
16 |  Cecilia_user |
17 |  Abigail_user |
18 |  Blanche_user |
19 |  Elizabeth_user |
20 |  ....... |
21 +----------+
22 4999850 row in set
</code>

From this data we conclude:

Index length must not be too short (e.g., 2‑3 characters); the first 6 characters provide high discrimination.

Omitting length leads to the common suffix "_user" offering no discrimination and wasting space.

Lengths of 6 or 7 appear most suitable, but algorithmic verification is needed.

3.2 Determining Prefix Length

Algorithm:

<code>select count(distinct left(`c_name`,calcul_len)) / count(*) from t_name;</code>

Applied to emp :

<code>SELECT
    count(DISTINCT LEFT(emp_name, 3)) / count(*) AS sel3,
    count(DISTINCT LEFT(emp_name, 4)) / count(*) AS sel4,
    count(DISTINCT LEFT(emp_name, 5)) / count(*) AS sel5,
    count(DISTINCT LEFT(emp_name, 6)) / count(*) AS sel6,
    count(DISTINCT LEFT(emp_name, 7)) / count(*) AS sel7
FROM emp;</code>

Result:

<code>+--------+--------+--------+--------+--------+
| sel3   | sel4   | sel5   | sel6   | sel7   |
+--------+--------+--------+--------+--------+
| 0.0012 | 0.0076 | 0.0400 | 0.1713 | 0.1713 |
+--------+--------+--------+--------+--------+
1 row in set
</code>

3.3 Adding Prefix Index

Syntax for adding a prefix index:

<code>ALTER TABLE t_name ADD KEY (c_name[(length)]);
</code>

Testing different lengths on emp_name (average length ≈ 6 characters):

<code>create index idx_emp_empname on emp(emp_name(2));
Query OK, 0 rows affected
select * from emp where emp_name='LsHfFJA';
... (1.793 sec)
</code>

With length 2, selectivity is only 0.0012, leading to slow queries.

<code>create index idx_emp_empname on emp(emp_name(6));
Query OK, 0 rows affected
select * from emp where emp_name='LsHfFJA';
... (0.003 sec)
</code>

Length 6 yields a selectivity of 0.1713 and considerably faster retrieval.

4 Summary

Choose a prefix long enough to achieve high selectivity while avoiding unnecessary space consumption. Ideally, the prefix cardinality should approach that of the full column to deliver excellent performance.

performancemysqlDatabase OptimizationIndexprefix indexselectivity
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.