Databases 14 min read

Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them

This article explores a common MySQL InnoDB pitfall where unique indexes fail to prevent duplicate rows, explains how NULL values and logical deletion affect uniqueness, and presents practical strategies—including composite keys, delete status counters, timestamps, hash fields, and Redis locks—to enforce true uniqueness.

macrozheng
macrozheng
macrozheng
Why MySQL Unique Indexes Still Allow Duplicates and How to Fix Them

Introduction

Recently I encountered a pitfall: a MySQL 8 InnoDB table with a unique index still allowed duplicate data. This article walks through that experience, explains why the unique index failed, and shares useful knowledge about unique indexes.

1. Reproducing the Issue

To prevent duplicate product groups, I created a "prevent‑duplicate" table and added a unique index on

category_id

,

unit_id

and

model_hash

. The table definition is:

<code>CREATE TABLE `product_group_unique` (
  `id` bigint NOT NULL,
  `category_id` bigint NOT NULL,
  `unit_id` bigint NOT NULL,
  `model_hash` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `in_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;</code>

Adding the unique index:

<code>ALTER TABLE product_group_unique ADD UNIQUE INDEX ux_category_unit_model (category_id, unit_id, model_hash);</code>

When

model_hash

is non‑NULL, duplicates are blocked. However, after inserting rows where

model_hash

is

NULL

, MySQL allowed duplicate records.

2. Unique Index Fields Containing NULL

If a column participating in a unique index contains

NULL

, MySQL treats each

NULL

as distinct, so the uniqueness constraint does not apply. This explains why inserting multiple rows with

model_hash = NULL

succeeded.

When a unique index column can be NULL , the uniqueness constraint may become ineffective.

3. Unique Index on Logically Deleted Tables

Logical deletion (using an

update

to set a

delete_status

flag) keeps the row in the table, which makes adding a unique index problematic because the deleted rows still occupy the unique key space.

Typical delete operation:

<code>DELETE FROM product WHERE id = 123;</code>

Logical delete operation:

<code>UPDATE product SET delete_status = 1, edit_time = NOW() WHERE id = 123;</code>

Because the row remains, a unique index on

name

and

model

prevents re‑inserting the same product even after it has been logically deleted.

3.1 Incremental Delete Status

Instead of a binary flag, use an ever‑increasing

delete_status

value (1, 2, 3, …). Each deletion increments the status, ensuring the composite unique key stays unique.

3.2 Add a Timestamp Field

Introduce a

timestamp

column and include it in the unique index (e.g.,

name

,

model

,

delete_status

,

timestamp

). The timestamp changes on each logical delete, preserving uniqueness.

3.3 Add an ID Field

Add a separate

delete_id

column. When a row is logically deleted, set

delete_id

to the row’s primary key. The unique index on

name

,

model

,

delete_status

,

delete_id

then remains valid without altering existing delete logic.

4. Adding a Unique Index to Tables with Historical Duplicate Data

If a table already contains duplicate historical rows, first assign a distinguishing

delete_id

(or similar) to each row, then create the composite unique index on the chosen columns plus the new identifier.

5. Unique Index on Large Columns

MySQL InnoDB limits index length to 3072 bytes (1000 bytes for a unique key). When a column (e.g.,

model

) exceeds this, the unique index cannot be created directly.

5.1 Add a Hash Column

Store a short hash (e.g., 16‑ or 32‑character) of the large column in a separate field and build the unique index on the hash together with other columns. This avoids the length limitation, though hash collisions must be considered.

5.2 Omit the Unique Index

Rely on application‑level mechanisms (single‑threaded jobs, MQ consumers, etc.) to guarantee uniqueness when an index is impractical.

5.3 Use Redis Distributed Locks

Generate a hash from the combination of fields and acquire a Redis lock on that hash before inserting. Even if a collision occurs, the probability is low under normal concurrency.

6. Bulk Insertion Scenarios

When inserting many rows, relying on a Redis lock per row degrades performance. Instead, let MySQL’s unique index handle duplicates: a single bulk

INSERT

will fail for duplicate rows, allowing the database to enforce uniqueness efficiently.

MySQLDatabase Designlogical deleteunique indexNull Handling
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.