Adding Unique Indexes in MySQL: Methods, Risks, and Hook‑Based Mitigations
This article explains how to add unique indexes to large MySQL tables using Online DDL, pt‑osc, and gh‑ost, compares their advantages and drawbacks, outlines common data‑loss risks, and provides practical hook scripts and best‑practice recommendations to mitigate those risks.
In MySQL 5.6 and later, adding a unique index can be performed without rebuilding the table by using Online DDL, but for large tables third‑party tools such as pt‑online‑schema‑change (pt‑osc) and gh‑ost are often preferred.
Online DDL
Online DDL creates the index in‑place, does not block DML, and does not rebuild the table. The process consists of a full data scan (synchronizing all rows) followed by a synchronized incremental phase that replays row logs. Although it acquires an MDL lock during the Prepare and Commit phases, the lock is released before the Execute phase, making the impact on DML minimal.
pt‑osc
pt‑osc creates a new table with the same structure, copies data with INSERT IGNORE INTO , and then applies incremental changes via triggers. It does not validate duplicate values, so duplicate rows are either ignored (full copy) or overwritten (incremental copy). The tool lacks built‑in safeguards to prevent data loss when adding a unique index.
gh‑ost
gh‑ost follows a similar copy‑then‑swap approach but supports a hook mechanism that can run custom scripts before and after the copy phase. Like pt‑osc, it does not automatically reject duplicate values, but the hook can be used to verify uniqueness and abort the migration if necessary.
Comparison of Schemes
Scheme
Data Loss
Recommendation
Online DDL
No
Suitable for small tables or when replica lag is acceptable
pt‑osc
Possible
Not recommended for adding unique indexes
gh‑ost
Possible (mitigated with hooks)
Preferred for large tables when combined with uniqueness checks
Risks When Adding Unique Indexes
Three typical data‑loss scenarios are identified:
Adding a new column with a unique index while the column has a default value; duplicate rows may be collapsed.
The original table already contains duplicate values for the indexed column; the migration will drop the duplicates.
During the migration, new DML creates duplicate values that overwrite existing rows, especially when the cut‑over occurs.
Risk Mitigation Strategies
Prohibit mixing other schema changes with the unique‑index addition.
Use a hook to run a uniqueness‑check SQL before the migration and abort if duplicates exist.
When using gh‑ost, enable the hook to compare the estimated row count with the actual copied rows and stop the migration if they differ.
Hook Example (Basic)
# ./pt-online-schema-change --version
pt-online-schema-change 3.0.13
# ./bin/gh-ost --version
1.1.5A simple hook script ( gh-ost-on-rowcount-complete-hook ) records the estimated row count, while gh-ost-on-row-copy-complete-hook compares it with the actual copied rows and aborts on mismatch.
Enhanced Hook Example
The enhanced script performs the following steps:
Loads configuration and logging utilities.
Executes a custom SQL that returns count(distinct …):count(*) for the columns to be indexed.
Parses the result and verifies that the distinct count matches the total row count, indicating true uniqueness.
Logs success or failure and exits with an appropriate status code.
#!/bin/bash
work_dir="/opt/soft/zzonlineddl"
. ${work_dir}/function/log/f_logging.sh
# load hook configuration
hook_conf="${work_dir}/hook/conf/--mysql_port--_${GH_OST_DATABASE_NAME}.${GH_OST_TABLE_NAME}"
. ${hook_conf}
function f_main(){
count_info="$( ${mysql_comm} -NBe "${mysql_sql}" )"
count_total="$(awk -F: '{print $NF}' <<< "${count_info}")"
if [ -z "${count_total}" ]; then
f_logging "$(date '+%F %T')":ERROR "Unique index field check failed, aborting"
exit -1
fi
# compare distinct count with total rows
if [ "${count_total}" -eq "${GH_OST_COPIED_ROWS}" ]; then
f_logging "$(date '+%F %T')":INFO "Uniqueness verified, proceeding"
else
f_logging "$(date '+%F %T')":ERROR "Potential data loss detected, aborting"
exit -1
fi
exit 0
}
f_mainThe script is not a universal solution; extreme cases such as large‑table long‑running hooks or concurrent inserts followed by deletes can still cause data loss.
Testing with gh‑ost
Using the --test-on-replica flag runs the migration on a replica, stops replication, swaps tables, reverses the swap, and leaves both tables in sync for manual verification. However, this does not eliminate the time gap between cut‑over and verification, so duplicate inserts during that window remain a risk.
Final Recommendations
If replica lag is acceptable, prefer Online DDL for small tables.
When using third‑party tools, choose gh‑ost with hooks and always verify column uniqueness before migration.
Avoid dropping the old table immediately; keep it as a fallback (use --no-drop-old-table for pt‑osc and avoid --ok-to-drop-table for gh‑ost).
Consider the impact on write performance: unique indexes cannot use the InnoDB change buffer, so they may be slower than regular secondary indexes.
Overall, adding a unique index to a large MySQL table is feasible but requires careful planning, risk assessment, and appropriate tooling to prevent data loss.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.