Databases 8 min read

Efficient Full‑Table Updates in MySQL: Handling Row‑Based Binlog, Deep Pagination, and Batch Processing

The article explains why executing a full‑table UPDATE on large MySQL tables using row‑based binlog can overload replication, discusses the inefficiencies of deep pagination and IN clauses, and presents a batch‑processing strategy that leverages SQL_NO_CACHE and FORCE INDEX to safely and efficiently migrate data.

Architect's Guide

Oct 1, 2024

Efficient Full‑Table Updates in MySQL: Handling Row‑Based Binlog, Deep Pagination, and Batch Processing

When a business needs to migrate data—such as converting stored URLs from http to https —a full‑table UPDATE works for small tables but becomes problematic for tables with hundreds of millions of rows. In a master‑slave MySQL setup that uses row‑based binlog, each row change generates a large amount of binlog data, which then has to be streamed to the replica, causing high I/O, replication lag, and risk of failure.

Binlog Formats

Format

Description

statement

Logs the original SQL statement; low log volume but can break with nondeterministic functions (e.g., RAND()).

row

Logs each row’s before/after image; high log volume for bulk DELETE / UPDATE.

mixed

Combines both; usually statement for simple DML and row for complex statements.

Because the production environment uses the row format, a naïve full‑table UPDATE would flood the master’s binlog and block the replica.

Direct UPDATE Problems

An initial attempt used a simple statement:

update tb_user_info set user_img=replace(user_img,'http://','https://');

This generated massive binlog traffic and was deemed unacceptable.

Deep Pagination Issue

Trying to batch the work with LIMIT offsets (e.g., LIMIT 1,1000) leads to “deep pagination” where MySQL must traverse the B‑tree leaf nodes for each offset, resulting in near‑full‑table scans and poor performance.

IN‑Clause Inefficiency

Another idea was to collect a batch of IDs and update with WHERE id IN (...). Although MySQL can use index lookups for the IN list, the approach remains slow for large batches.

Final Batch‑Processing Solution

After discussions with DBAs, the following strategy was adopted:

Use

SELECT /*!40001 SQL_NO_CACHE */ id FROM tb_user_info FORCE INDEX(`PRIMARY`) WHERE id>"1" ORDER BY id LIMIT 1000,1;

to fetch a small window of primary‑key IDs without polluting the InnoDB buffer pool.

Update the range with

UPDATE tb_user_info SET user_img=replace(user_img,'http','https') WHERE id > "{start}" AND id < "{end}";

The SQL_NO_CACHE hint forces the query to bypass the buffer pool, preventing cold data from evicting hot pages. FORCE INDEX(PRIMARY) guarantees the primary‑key index is used, and ordering by id ensures sequential access.

select /*!40001 SQL_NO_CACHE */ id from tb_user_info FORCE INDEX(`PRIMARY`) where id> "1" ORDER BY id limit 1000,1;

update tb_user_info set user_img=replace(user_img,'http','https') where id > "{1}" and id < "{2}";

This method allows the update rate to be throttled via an API, monitoring replication lag, IOPS, and memory usage. The process can be parallelized with a thread pool to increase throughput while still controlling the overall load.

Other Considerations

If primary keys are generated by Snowflake or auto‑increment, rows are naturally ordered, simplifying range scans. For UUID primary keys, a pre‑migration step to rewrite IDs or to perform a separate data‑pre‑processing phase is required.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization mysql binlog Full Table Update FORCE INDEX SQL_NO_CACHE

Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.