Databases 6 min read

Deleting Duplicate Records in MySQL with DELETE JOIN and Subqueries

This article explains how to identify and remove duplicate rows in a MySQL table—keeping a single record per duplicate group—by using a self‑join DELETE statement or a nested subquery approach, complete with sample table creation and data insertion scripts.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Deleting Duplicate Records in MySQL with DELETE JOIN and Subqueries

Duplicate records in a database table are common due to data migration, bugs, or manual testing inserts, and they often violate business rules; the article shows how to delete the extra rows while preserving one instance using a single SQL statement in MySQL.

1. Preparation

Create a sample t_customer table and insert test data that includes duplicate entries based on cust_name , gender , and email fields.

CREATE TABLE `t_customer` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键', `cust_name` varchar(32) NOT NULL COMMENT '名字', `gender` varchar(10) NOT NULL COMMENT '性别', `email` varchar(32) NOT NULL COMMENT 'email', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `t_customer` VALUES ('1','Tom','Male','[email protected]'), ('2','Jennifer','Female','[email protected]'), ('3','Tom','Male','[email protected]'), ('4','John','Male','[email protected]'), ('5','Jennifer','Female','[email protected]'), ('6','Tom','Male','[email protected]');

According to the rule that rows with identical cust_name , gender , and email are considered duplicates, the rows with id 1, 3, 6 and 2, 5 are duplicates.

2. Method 1 – DELETE JOIN

Use a self‑join to delete the rows with a smaller id , keeping the row with the largest id for each duplicate group:

DELETE c1 FROM t_customer c1 INNER JOIN t_customer c2 WHERE c1.id < c2.id AND c1.cust_name = c2.cust_name AND c1.gender = c2.gender AND c1.email = c2.email;

After execution, only one row per duplicate group remains.

3. Method 2 – Subquery

Use GROUP BY with HAVING COUNT(*) > 1 to find duplicate groups, then delete rows whose id is not the maximum within each group:

DELETE FROM t_customer WHERE (cust_name, gender, email) IN ( SELECT cust_name, gender, email FROM ( SELECT cust_name, gender, email FROM t_customer GROUP BY cust_name, gender, email HAVING COUNT(*) > 1 ) t1 ) AND id NOT IN ( SELECT id FROM ( SELECT MAX(id) AS id FROM t_customer GROUP BY cust_name, gender, email HAVING COUNT(*) > 1 ) t2 );

This approach is more complex due to nested subqueries, but achieves the same result.

When running such statements on a production database, always back up the data and test thoroughly to avoid accidental data loss.

Finally, the author asks readers to like, share, and follow the public account to support further content creation.

Recommended Reading:

One SQL solves the problem – interviewers impressed

MySQL table relationship updates

High‑frequency interview question: multithreaded ordered printing

Comprehensive guide to MySQL index loss scenarios

Design pattern usage tips

SQLdatabaseMySQLsubqueryDELETE JOINDuplicate Records
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.