How to Safely Perform Billion-Row MySQL Updates Without Overloading Replication
This article explains why a direct full‑table UPDATE on a massive MySQL table can overwhelm binlog replication, explores the pitfalls of deep pagination and IN‑clauses, and presents a batch‑processing strategy using NO_CACHE and FORCE INDEX to achieve efficient, low‑impact updates.
When a business needs to run a full‑table UPDATE on a large MySQL table (hundreds of millions of rows), the generated binlog can flood a row‑format replication setup, causing severe performance and stability issues.
Binlog Formats
statement : records each executed SQL, low log volume but may fail with nondeterministic functions.
row : records the before/after image of every changed row, leading to huge logs for bulk deletes or updates.
mixed : combines both, using statement for simple statements and row for others.
Our production MySQL uses row‑format binlog, so a full‑table UPDATE on a billion‑row table would create massive binlogs and block the replica.
Direct UPDATE Problem
Attempting a simple UPDATE such as
<code>update tb_user_info set user_img=replace(user_img,'http://','https://');</code>quickly overwhelms the master and replica.
Deep Pagination Issue
Using
LIMITwith an increasing offset causes MySQL to scan deep into the B‑tree, essentially performing a full table scan for large offsets, which is highly inefficient.
IN‑Clause Inefficiency
Fetching a list of IDs and updating with
WHERE id IN (...)still results in poor performance despite some index predictions.
Final Batch‑Update Solution
The effective approach combines two queries: a SELECT that avoids the buffer pool and forces the primary index, followed by an UPDATE that operates on a sorted range of primary keys.
<code>select /*!40001 SQL_NO_CACHE */ id from tb_user_info FORCE INDEX(`PRIMARY`) where id>"1" ORDER BY id limit 1000,1;</code> <code>update tb_user_info set user_img=replace(user_img,'http','https') where id > "{1}" and id < "{2}";</code>The
/*!40001 SQL_NO_CACHE */hint prevents the query from polluting the InnoDB buffer pool, keeping the cold data out of the hot cache.
FORCE INDEX(`PRIMARY`)ensures the primary key index is used, and ordering by
idallows efficient range scans.
By controlling the batch size and execution rate through an API (e.g., limiting threads in a thread pool), you can monitor and adjust IOPS, memory usage, and replication lag, achieving safe, incremental updates.
Other Considerations
If primary keys are generated with Snowflake or auto‑increment, the natural order aids this method; for UUIDs, you must pre‑process data before the bulk update.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.