Databases 4 min read

Using DISTINCT, GROUP BY, and ROW_NUMBER for De‑duplication in SQL

This article explains how to remove duplicate rows in SQL by using DISTINCT, GROUP BY, and the ROW_NUMBER window function, compares their syntax and performance, and provides concrete examples with a Task table and a Test helper table.

Laravel Tech Community

Nov 11, 2021

Using DISTINCT, GROUP BY, and ROW_NUMBER for De‑duplication in SQL

When extracting data with SQL, duplicate records often appear, such as when calculating unique visitors (UV), requiring de‑duplication.

In MySQL the typical approaches are DISTINCT or GROUP BY, while databases that support window functions (e.g., Hive, Oracle) can also use ROW_NUMBER() for more flexible de‑duplication.

Consider a table Task with columns task_id (task identifier), order_id (order identifier) and start_time. One task may correspond to multiple orders, so task_id is not unique.

To obtain the total number of distinct tasks, the following methods can be used:

DISTINCT

-- list all unique task_id values
-- select distinct task_id from Task;

-- total number of tasks
select count(distinct task_id) as task_num
from Task;

The DISTINCT clause is simple but can be less efficient and cannot be used to display the distinct rows together with other columns without additional handling.

GROUP BY

-- list unique task_id values (null is also a value)
-- select task_id from Task group by task_id;

-- total number of tasks
select count(task_id) as task_num
from (
    select task_id
    from Task
    group by task_id
) tmp;

ROW_NUMBER() (window function)

-- use in databases that support window functions
select count(case when rn = 1 then task_id else null end) as task_num
from (
    select task_id,
           row_number() over (partition by task_id order by start_time) as rn
    from Task
) tmp;

An additional example with a helper table Test demonstrates how DISTINCT and GROUP BY behave differently when multiple columns are selected.

select distinct user_id from Test;                     -- returns 1, 2
select distinct user_id, user_type from Test;          -- returns 1 1, 1 2, 2 1
select user_id from Test group by user_id;             -- returns 1, 2
select user_id, user_type from Test group by user_id, user_type; -- returns 1 1, 1 2, 2 1

These snippets illustrate the syntax and performance considerations of each de‑duplication technique.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Deduplication databases window-functions GROUP BY DISTINCT ROW_NUMBER

Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.