Databases 19 min read

Understanding MySQL Multi‑Threaded Slave (MTS) Parallel Replication: Schema, Commit‑Parent, and Lock‑Interval Methods

This article explains MySQL's evolution of parallel replication—from schema‑based MTS in 5.6, through commit‑parent based replication in 5.7, to the more efficient lock‑interval approach—detailing the underlying binlog group‑commit mechanisms, master‑side support, and the slave‑side worker coordination that together reduce replication lag.

Tencent Database Technology

Dec 5, 2017

Understanding MySQL Multi‑Threaded Slave (MTS) Parallel Replication: Schema, Commit‑Parent, and Lock‑Interval Methods

1. Introduction

MySQL 5.6 introduced schema‑based parallel replication (MTS), allowing binlog events that affect different schemas, are not DDL, and have no foreign‑key dependencies to be replayed concurrently on the slave. The slave still has a single I/O thread that fetches the binlog and writes a relay log, while a coordinator thread decides whether each event can be dispatched to a worker thread. The number of workers is controlled by slave_parallel_workers. Events that cannot be dispatched are replayed by the coordinator after all workers finish.

Schema‑based MTS reduces replication lag caused by single‑threaded replay, but it does not help when the instance contains only one schema, which is a common case.

MySQL 5.7 first implemented commit‑parent based parallel replication, removing the schema restriction and greatly improving replay efficiency. The latest code further optimizes parallel replication using a lock‑interval based method, offering even higher concurrency.

2. Commit‑Parent Based Parallel Replication

The commit‑parent method works on the principle that if two transactions can prepare successfully on the master at the same time, they have no conflict and can be replayed in parallel on the slave.

To let the slave know whether two transactions prepared simultaneously, the master records a prepare timestamp in the binlog. MySQL divides the timeline into small time slices; each transaction’s prepare timestamp marks the start of a slice, and its commit ends the slice. The prepare timestamp is implemented as a logical, monotonically increasing counter called the commit‑parent.

If two transactions share the same commit‑parent, they prepared in the same time slice and can be replayed concurrently. The slave simply checks whether the current transaction’s commit‑parent matches the one of the transaction already executing.

Example:

Trx1 ------------P----------C-------------------------------->
                |
Trx2 ----------------P------+---C---------------------------->
                |   |
Trx3 -------------------P---+---+-----C---------------------->
                |   |     |
Trx4 -----------------------+-P-+-----+----C----------------->
                |   |     |    |
Trx5 -----------------------+---+-P---+----+---C------------->
                |   |     |    |   |
Trx6 -----------------------+---+---P-+----+---+---C---------->
                |   |     |    |   |   |
Trx7 -----------------------+---+-----+----+---+-P-+--C------->
                |   |     |    |   |   |  |

In this diagram, Trx1‑Trx3 can be replayed in parallel, as can Trx5‑Trx6.

3. Lock‑Interval Based Parallel Replication

Even when transactions prepare successfully, they may still be serialized by the coordinator if they acquire locks that conflict. The lock‑interval method defines a transaction’s lock‑interval as the period from acquiring its last required lock (the prepare of the final DML) to releasing the first lock after commit. If two transactions’ lock‑intervals overlap, they have no lock conflict and can be replayed in parallel.

Like the commit‑parent approach, the master records each transaction’s lock‑interval in the binlog (using last_committed and sequence_number). The slave compares the incoming transaction’s interval with those of currently running transactions; if the start point is later than the earliest end point among active transactions, it must wait, otherwise it can run in parallel.

- Parallelizable:
  Trx1 -----L---------C------------>
  Trx2 ----------L---------C------->

- Not parallelizable:
  Trx1 -----L----C----------------->
  Trx2 ---------------L----C------->

4. Binlog Group Commit

To guarantee durability, MySQL must fsync the WAL before acknowledging a commit. Performing an fsync per transaction is costly, so MySQL groups multiple transactions’ writes into a single fsync operation, both for the redo log and the binlog.

For a single transaction, MySQL uses a two‑phase commit (2PC) to keep redo log and binlog consistent. In versions prior to 5.6, both prepare and commit required separate fsyncs, resulting in three fsync calls per transaction. MySQL 5.7 optimizes this by eliminating the redo‑log fsync in the commit phase and by fsync‑ing prepare logs together with the binlog group commit.

Group commit also preserves the order of transactions across redo log and binlog, which is essential for tools like XtraBackup that rely on this ordering.

MYSQL_BIN_LOG::ordered_commit --> process_flush_stage_queue --> ha_flush_logs // fsync prepare redo log
                                 |__ flush_thread_caches // write binlog caches

Key parameters influencing group commit: binlog_max_flush_queue_time: maximum wait time for the flush‑stage leader before moving to the next stage (effective before 5.7.9). binlog_group_commit_sync_delay and binlog_group_commit_sync_no_delay_count: delay (in ms) and transaction count thresholds that trigger a group commit.

5. Master‑Side MTS Support

The master records last_committed and sequence_number in each GTID_event. These values are derived from the global logical clock inside MYSQL_BIN_LOG and are written to the binlog during the flush stage of group commit.

#170814 11:48:16 server id 1 end_log_pos 259 CRC32 0xb7cf2899 GTID last_committed=0 sequence_number=1
#170814 11:48:22 server id 1 end_log_pos 513 CRC32 0x8bfdd3a0 GTID last_committed=0 sequence_number=2
#170814 11:48:35 server id 1 end_log_pos 767 CRC32 0x56e7f3f7 GTID last_committed=2 sequence_number=3
#170814 11:48:49 server id 1 end_log_pos 1021 CRC32 0xb601ea8f GTID last_committed=3 sequence_number=4

Relevant classes include Gtid_event, Transaction_ctx, and MYSQL_BIN_LOG, which together maintain the logical clock and generate the lock‑interval timestamps.

6. Slave‑Side MTS Implementation

When START SLAVE is issued, the slave launches an I/O thread and a coordinator (SQL) thread. The coordinator reads events from the relay log, decides whether each event can be dispatched to a worker, and either hands it off or replays it itself.

handle_slave_sql --> slave_start_workers // fork workers (count = slave_parallel_workers)
               |__ exec_relay_log_event // loop processing each event
               |__ slave_stop_workers // STOP SLAVE

The coordinator uses next_event to fetch a binlog event, then apply_event to determine dispatch. If dispatchable, append_item_to_jobs adds the event (and any previously buffered events) to the appropriate worker’s queue.

exec_relay_log_event --> next_event --> read_log_event
               |__ mts_checkpoint_routine // checkpoint handling
               |__ apply_event_and_update_pos --> apply_event
                                   |__ append_item_to_jobs (pass to worker)

Workers execute their job groups, update commit positions, and signal completion via the global assigned queue (GAQ). The coordinator periodically runs mts_checkpoint_routine to remove finished job groups from GAQ and advance the low‑water‑mark.

slave_worker_exec_job_group --> slave_worker_exec_event (pop jobs) --> update vars like CGEP
               |__ slave_worker_ends_group --> commit_positions (update Slave_job_group)

Mapping of schemas to workers (used for schema‑based MTS) is stored in Relay_log_info.mapping_db_to_worker as a hash table of db_worker_hash_entry structures.

7. Summary

The article covered MySQL binlog group commit and three parallel replication techniques: schema‑based, commit‑parent based, and lock‑interval based, explaining their internal mechanisms, master‑side data recording, and slave‑side coordination.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql MTS parallel replication

Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.