Databases 9 min read

Efficient Full‑Table Update Strategies for MySQL with Row‑Based Binlog Replication

This article explains the challenges of performing full‑table updates on large MySQL tables using row‑based binlog replication and presents a step‑by‑step batch‑update strategy that leverages SQL_NO_CACHE, FORCE INDEX, and controlled pagination to minimize binlog volume and maintain performance.

Code Ape Tech Column

Aug 20, 2024

Efficient Full‑Table Update Strategies for MySQL with Row‑Based Binlog Replication

Hello everyone, I am Chen, and today I will share how to handle full‑table updates in MySQL when the data volume reaches hundreds of millions.

When a table is small (tens of thousands of rows), a simple UPDATE works fine. However, with large tables the MySQL master‑slave architecture uses binlog for replication, and the binlog format (statement, row, mixed) greatly affects the amount of data generated during an UPDATE. The table below summarizes the three formats:

Format

Content

statement

Logs each executed SQL on the master; low log volume but functions like RAND() may cause inconsistencies.

row

Logs the before‑and‑after image of every changed row; log volume can be huge for bulk deletes or updates.

mixed

Combines both: statement for simple statements, row for functions.

Our production MySQL uses the row format. Executing a full‑table UPDATE on a table with billions of rows would generate massive binlogs, causing the slave to block while applying them, which is unacceptable. The article walks through the evolution from a naïve UPDATE to a safe, batch‑processing solution.

Problems with Direct UPDATE

We needed to replace http:// with https:// in a user‑image column of a table containing tens of millions of rows. The initial attempt was a single statement:

update tb_user_info set user_img=replace(user_img,'http://','https://');

Deep Pagination Issue

Running the update in batches with LIMIT seemed reasonable, but MySQL implements LIMIT by scanning the B‑tree leaf nodes sequentially. When the offset becomes large, the operation degrades to near full‑table scans, known as the “deep pagination” problem.

update tb_user_info set user_img=replace(user_img,'http://','https://') limit 1,1000;

Efficiency of IN

Another idea was to collect a batch of primary‑key IDs and update using WHERE id IN (...). Although MySQL can use index lookups for IN, the approach still performs poorly on large ID lists.

select * from tb_user_info where id> {index} limit 100;

update tb_user_info set user_img=replace(user_img,'http','https') where id in {id1,id3,id2};

Final Version

After discussions with the DBA, we settled on the following pattern, which uses two key hints: /*!40001 SQL_NO_CACHE */ – forces the query not to use the InnoDB buffer pool, keeping the data cold and not polluting the cache. FORCE INDEX(PRIMARY) – forces the optimizer to use the primary‑key index.

We first select a sorted batch of primary‑key IDs, then update the range directly:

select /*!40001 SQL_NO_CACHE */ id from tb_user_info FORCE INDEX(`PRIMARY`) where id> "1" ORDER BY id limit 1000,1;

update tb_user_info set user_img=replace(user_img,'http','https') where id > "{1}" and id < "{2}";

The SQL_NO_CACHE hint prevents the selected rows from being cached, avoiding unnecessary pressure on the buffer pool during massive scans. Forcing the primary‑key index ensures the optimizer does not choose a less efficient secondary index.

By exposing this logic through an API, we can control the update rate, monitor master‑slave lag, IOPS, and memory usage, and optionally run multiple threads via a thread pool to increase throughput while keeping the replication load manageable.

Other Considerations

If primary keys are generated by Snowflake or auto‑increment, the insertion order matches the key order. For UUID primary keys, we must pre‑process the data before the bulk update, ensuring the replacement is done before the new records are inserted.

Source: https://juejin.cn/post/6897185211340029966

Final Note (Please Follow)

Chen publishes three technical columns compiled into PDFs. You can obtain them by following the public account “码猿技术专栏” and replying with the relevant keywords (Spring Cloud 进阶, Spring Boot进阶, Mybatis 进阶).

Spring Cloud 进阶 – reply “Spring Cloud 进阶” to get the PDF.

Spring Boot 进阶 – reply “Spring Boot进阶” to get the PDF.

Mybatis 进阶 – reply “Mybatis 进阶” to get the PDF.

Follow the public account for more technical content and join the discussion group by replying “加群”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL mysql binlog Database Optimization Full Table Update

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.