Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them
This article explores a common MySQL InnoDB pitfall where unique indexes fail to prevent duplicate rows, explains how NULL values and logical deletion affect uniqueness, and presents practical strategies—including composite keys, delete status counters, timestamps, hash fields, and Redis locks—to enforce true uniqueness.
Introduction
Recently I encountered a pitfall: a MySQL 8 InnoDB table with a unique index still allowed duplicate data. This article walks through that experience, explains why the unique index failed, and shares useful knowledge about unique indexes.
1. Reproducing the Issue
To prevent duplicate product groups, I created a "prevent‑duplicate" table and added a unique index on
category_id,
unit_idand
model_hash. The table definition is:
<code>CREATE TABLE `product_group_unique` (
`id` bigint NOT NULL,
`category_id` bigint NOT NULL,
`unit_id` bigint NOT NULL,
`model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
`in_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;</code>Adding the unique index:
<code>ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model (category_id, unit_id, model_hash);</code>When
model_hashis non‑NULL, duplicates are blocked. However, after inserting rows where
model_hashis
NULL, MySQL allowed duplicate records.
2. Unique Index Fields Containing NULL
If a column participating in a unique index contains
NULL, MySQL treats each
NULLas distinct, so the uniqueness constraint does not apply. This explains why inserting multiple rows with
model_hash = NULLsucceeded.
When a unique index column can be NULL , the uniqueness constraint may become ineffective.
3. Unique Index on Logically Deleted Tables
Logical deletion (using an
updateto set a
delete_statusflag) keeps the row in the table, which makes adding a unique index problematic because the deleted rows still occupy the unique key space.
Typical delete operation:
<code>DELETE FROM product WHERE id = 123;</code>Logical delete operation:
<code>UPDATE product SET delete_status = 1, edit_time = NOW() WHERE id = 123;</code>Because the row remains, a unique index on
nameand
modelprevents re‑inserting the same product even after it has been logically deleted.
3.1 Incremental Delete Status
Instead of a binary flag, use an ever‑increasing
delete_statusvalue (1, 2, 3, …). Each deletion increments the status, ensuring the composite unique key stays unique.
3.2 Add a Timestamp Field
Introduce a
timestampcolumn and include it in the unique index (e.g.,
name,
model,
delete_status,
timestamp). The timestamp changes on each logical delete, preserving uniqueness.
3.3 Add an ID Field
Add a separate
delete_idcolumn. When a row is logically deleted, set
delete_idto the row’s primary key. The unique index on
name,
model,
delete_status,
delete_idthen remains valid without altering existing delete logic.
4. Adding a Unique Index to Tables with Historical Duplicate Data
If a table already contains duplicate historical rows, first assign a distinguishing
delete_id(or similar) to each row, then create the composite unique index on the chosen columns plus the new identifier.
5. Unique Index on Large Columns
MySQL InnoDB limits index length to 3072 bytes (1000 bytes for a unique key). When a column (e.g.,
model) exceeds this, the unique index cannot be created directly.
5.1 Add a Hash Column
Store a short hash (e.g., 16‑ or 32‑character) of the large column in a separate field and build the unique index on the hash together with other columns. This avoids the length limitation, though hash collisions must be considered.
5.2 Omit the Unique Index
Rely on application‑level mechanisms (single‑threaded jobs, MQ consumers, etc.) to guarantee uniqueness when an index is impractical.
5.3 Use Redis Distributed Locks
Generate a hash from the combination of fields and acquire a Redis lock on that hash before inserting. Even if a collision occurs, the probability is low under normal concurrency.
6. Bulk Insertion Scenarios
When inserting many rows, relying on a Redis lock per row degrades performance. Instead, let MySQL’s unique index handle duplicates: a single bulk
INSERTwill fail for duplicate rows, allowing the database to enforce uniqueness efficiently.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.