Databases 8 min read

Efficient Full‑Table Updates in MySQL: Handling Row‑Based Binlog, Deep Pagination, and Batch Processing

The article explains why executing a full‑table UPDATE on large MySQL tables using row‑based binlog can overload replication, discusses the inefficiencies of deep pagination and IN clauses, and presents a batch‑processing strategy that leverages SQL_NO_CACHE and FORCE INDEX to safely and efficiently migrate data.

Architect's Guide
Architect's Guide
Architect's Guide
Efficient Full‑Table Updates in MySQL: Handling Row‑Based Binlog, Deep Pagination, and Batch Processing

When a business needs to migrate data—such as converting stored URLs from http to https —a full‑table UPDATE works for small tables but becomes problematic for tables with hundreds of millions of rows. In a master‑slave MySQL setup that uses row‑based binlog, each row change generates a large amount of binlog data, which then has to be streamed to the replica, causing high I/O, replication lag, and risk of failure.

Binlog Formats

Format

Description

statement

Logs the original SQL statement; low log volume but can break with nondeterministic functions (e.g.,

RAND()

).

row

Logs each row’s before/after image; high log volume for bulk

DELETE

/

UPDATE

.

mixed

Combines both; usually statement for simple DML and row for complex statements.

Because the production environment uses the row format, a naïve full‑table UPDATE would flood the master’s binlog and block the replica.

Direct UPDATE Problems

An initial attempt used a simple statement:

update tb_user_info set user_img=replace(user_img,'http://','https://');

This generated massive binlog traffic and was deemed unacceptable.

Deep Pagination Issue

Trying to batch the work with LIMIT offsets (e.g., LIMIT 1,1000 ) leads to “deep pagination” where MySQL must traverse the B‑tree leaf nodes for each offset, resulting in near‑full‑table scans and poor performance.

IN‑Clause Inefficiency

Another idea was to collect a batch of IDs and update with WHERE id IN (...) . Although MySQL can use index lookups for the IN list, the approach remains slow for large batches.

Final Batch‑Processing Solution

After discussions with DBAs, the following strategy was adopted:

Use SELECT /*!40001 SQL_NO_CACHE */ id FROM tb_user_info FORCE INDEX(`PRIMARY`) WHERE id>"1" ORDER BY id LIMIT 1000,1; to fetch a small window of primary‑key IDs without polluting the InnoDB buffer pool.

Update the range with UPDATE tb_user_info SET user_img=replace(user_img,'http','https') WHERE id > "{start}" AND id < "{end}";

The SQL_NO_CACHE hint forces the query to bypass the buffer pool, preventing cold data from evicting hot pages. FORCE INDEX(PRIMARY) guarantees the primary‑key index is used, and ordering by id ensures sequential access.

select /*!40001 SQL_NO_CACHE */ id from tb_user_info FORCE INDEX(`PRIMARY`) where id> "1" ORDER BY id limit 1000,1;

update tb_user_info set user_img=replace(user_img,'http','https') where id > "{1}" and id < "{2}";

This method allows the update rate to be throttled via an API, monitoring replication lag, IOPS, and memory usage. The process can be parallelized with a thread pool to increase throughput while still controlling the overall load.

Other Considerations

If primary keys are generated by Snowflake or auto‑increment, rows are naturally ordered, simplifying range scans. For UUID primary keys, a pre‑migration step to rewrite IDs or to perform a separate data‑pre‑processing phase is required.

performance optimizationMySQLbinlogFull Table Updateforce indexSQL_NO_CACHE
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.