Databases 12 min read

How to Quickly Find Missing Auto‑Increment Primary Key Values in MySQL

This article explains several efficient methods—including shell scripts, generated series tables, JSON functions, and MySQL 8.0 recursive CTEs—to quickly locate missing auto‑increment primary key values in a MySQL table, comparing their performance and suitability.

Aikesheng Open Source Community

Jan 12, 2021

How to Quickly Find Missing Auto‑Increment Primary Key Values in MySQL

In many MySQL applications the primary key is defined as an auto‑increment column, but deletions or other operations can leave gaps in the sequence. The article starts by showing a simple but inefficient approach using INSERT IGNORE, which requires repeated retries when trying to reuse those missing IDs.

Shell‑side solution : By exporting the existing IDs to a file and generating a full list of numbers with seq, the missing IDs can be obtained with a grep -vwf pipeline. Example commands are provided to extract IDs from the table ytt_t0, generate the full range up to the maximum ID, and compute the difference, even for large ranges such as 100 000 IDs.

root@debian-ytt1:/var/lib/mysql# mysql -S /tmp/mysqld_3306.sock -D ytt_big -e "select id from ytt_t0" -ss > ytt_t0_ids.txt
root@debian-ytt1:/var/lib/mysql# for i in `seq 1 28`; do echo $i >> ids.txt; done
root@debian-ytt1:/var/lib/mysql# grep -vwf ytt_t0_ids.txt ids.txt | sed ':label;N;s/
/,/;b label'
# output: 5,6,7,8,9,10,11,16,17,18,20,21,22,23,24,25,26

Database‑side solution 1 – series table : Create a helper table ytt_seq that stores a continuous series of numbers up to the maximum ID, then join it with the original table to find NULL entries. The method works on all MySQL versions, though generating the series can be time‑consuming.

CREATE TABLE ytt_seq(id SERIAL PRIMARY KEY);
-- stored procedure to fill the table up to @max_id
DELIMITER $$
CREATE PROCEDURE sp_generate_series(f_tbname VARCHAR(64), f_max_id BIGINT UNSIGNED)
BEGIN
  DECLARE i BIGINT DEFAULT 0;
  TRUNCATE ytt_seq;
  WHILE i < f_max_id DO
    INSERT INTO ytt_seq VALUES (NULL);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

CALL sp_generate_series('ytt_seq', @max_id);
SELECT GROUP_CONCAT(ytt_seq.id) AS result
FROM ytt_t0 RIGHT JOIN ytt_seq USING (id)
WHERE ytt_t0.id IS NULL;

Database‑side solution 2 – JSON functions : For MySQL 8.0 the article demonstrates building a JSON array of all existing IDs, generating a JSON series of numbers, and using a custom function that leverages JSON_OVERLAPS to filter out the present IDs. This approach proved extremely slow (over 10 minutes) for the sample data and was abandoned.

-- generate JSON array of existing IDs
SELECT @arr1 FROM (SELECT @arr1 := JSON_ARRAY_APPEND(@arr1,'$',id) FROM ytt_t0, (SELECT @arr1 := '[]') b) T LIMIT 1;
-- function to generate full JSON series
CREATE FUNCTION func_generate_series_json(f_max_id BIGINT UNSIGNED) RETURNS JSON
BEGIN
  DECLARE v_result JSON DEFAULT '[]';
  DECLARE i BIGINT UNSIGNED DEFAULT 1;
  WHILE i <= f_max_id DO
    SET v_result = JSON_ARRAY_APPEND(v_result,'$',i);
    SET i = i + 1;
  END WHILE;
  RETURN v_result;
END;
-- function to get non‑overlapping values
CREATE FUNCTION func_get_json_common_values(f_str JSON, f_sub_str JSON) RETURNS JSON
BEGIN
  DECLARE i INT DEFAULT 0;
  DECLARE v_len INT;
  DECLARE v_tmp JSON;
  DECLARE v_result JSON DEFAULT '[]';
  SET v_len = JSON_LENGTH(f_str);
  WHILE i < v_len DO
    SET v_tmp = JSON_EXTRACT(f_str, CONCAT('$[', i, ']'));
    IF JSON_OVERLAPS(v_tmp, f_sub_str) = 0 THEN
      SET v_result = JSON_ARRAY_APPEND(v_result,'$',v_tmp);
    END IF;
    SET i = i + 1;
  END WHILE;
  RETURN v_result;
END;

SELECT func_get_json_common_values(func_generate_series_json(@max_id), @arr1) AS result;

Database‑side solution 3 – recursive CTE (MySQL 8.0) : The most efficient method uses a recursive common table expression to generate the full numeric series on the fly and left‑join it with the target table. A single query returns all missing IDs in milliseconds, making it the preferred solution when MySQL 8.0 is available.

WITH RECURSIVE tmp (id) AS (
  SELECT 1 UNION ALL
  SELECT id + 1 FROM tmp WHERE id < (SELECT MAX(id) FROM ytt_t0)
)
SELECT id AS gap_result
FROM tmp LEFT JOIN ytt_t0 USING (id)
WHERE ytt_t0.id IS NULL;

In summary, if you can upgrade to MySQL 8.0, the recursive CTE approach is the fastest and simplest. For earlier versions, generating a series table and joining it is reliable, while shell‑side processing remains a practical alternative for very large ID ranges.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL MySQL primary key CTE gap detection

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.