Understanding MySQL Parallel Replication: From Lag to Group Commit
This article explains why master‑slave lag occurs in MySQL, describes the evolution of parallel replication schemes—including group‑commit, Commit‑Parent‑Based, Lock‑Based, and WRITESET approaches—shows benchmark results, and provides practical configuration steps to enable high‑performance parallel replication.
Anyone who has maintained MySQL in production knows that master‑slave replication lag is a painful problem that can cause stale reads and affect high‑availability failover.
Contents:
Impact of replication lag
Overview of parallel replication schemes
Group‑commit based parallel replication (Commit‑Parent‑Based and Lock‑Based)
WRITESET scheme
Benchmark results
How to enable parallel replication
1. Impact of Replication Lag
Lag leads to two main issues: (1) read‑write split workloads may read stale data, and (2) large lag hampers the speed of failover because the replica must apply all pending binlog events before switching.
If the replica waits for all binlog changes before failover, service availability is reduced.
If it switches immediately, un‑applied changes are lost, which many applications cannot tolerate.
2. Parallel Replication Overview
MySQL has introduced several parallel replication schemes over time:
MySQL 5.6 – database‑level parallelism (rarely used in single‑database, multi‑table environments).
MySQL 5.7 – group‑commit based parallelism.
MySQL 8.0 – WRITESET based parallelism.
3. Group‑Commit Based Parallel Replication
3.1 Commit‑Parent‑Based Scheme
Transactions are split into a Prepare phase and a Commit phase. InnoDB uses pessimistic locking, so if two transactions are both in the Prepare phase they have no lock conflict and can be replayed in parallel on the replica.
Implementation details:
The master maintains a global counter that is incremented before a transaction commits at the storage engine level.
Before entering Prepare, the current counter value is stored in the transaction as the commit‑parent .
The commit‑parent is written into the binlog header.
During replay, if two transactions share the same commit‑parent they are executed in parallel.
Example of seven transactions:
<code>Trx1 ------------P----------C--------------------------------></code>
<code> |</code>
<code>Trx2 ----------------P------+---C----------------------------></code>
<code> | |</code>
<code>Trx3 -------------------P---+---+-----C----------------------></code>
<code> | | |</code>
<code>Trx4 -----------------------+-P-+-----+----C-----------------></code>
<code> | | | |</code>
<code>Trx5 -----------------------+---+-P---+----+---C-------------></code>
<code> | | | | |</code>
<code>Trx6 -----------------------+---+---P-+----+---+---C----------></code>
<code> | | | | | |</code>
<code>Trx7 -----------------------+---+-----+----+---+-P-+--C-------></code>
<code> | | | | | | |</code>Result:
Trx1, Trx2, Trx3 execute in parallel.
Trx4 executes serially.
Trx5, Trx6 execute in parallel.
Trx7 executes serially.
3.2 Lock‑Based Scheme
This scheme introduces the concept of a locking interval , defined from the moment the last DML statement acquires a lock in the Prepare phase to the moment the lock is released before the storage‑engine commit.
If two transactions have overlapping locking intervals, they have no lock conflict and can be replayed in parallel.
<code>Trx1 -----L---------C------------></code>
<code>Trx2 ----------L---------C-------></code>Conversely, non‑overlapping intervals cannot be parallelized:
<code>Trx1 -----L----C-----------------></code>
<code>Trx2 ---------------L----C-------></code>The master tracks four variables:
global.transaction_counter – transaction counter.
transaction.sequence_number – per‑transaction sequence number.
global.max_committed_transaction – maximum committed sequence number.
transaction.last_committed – maximum committed sequence number before the transaction enters Prepare.
These values are written to the binlog (as
GTID_LOG_EVENTfor GTID replication or
ANONYMOUS_GTID_LOG_EVENTotherwise).
3.3 Parallel Replay Logic on the Replica
The replica maintains an ordered
transaction_sequencequeue sorted by
sequence_number. A new transaction can be inserted only if its
last_committedis smaller than the
sequence_numberof the first transaction in the queue.
<code>transaction.last_committed < transaction_sequence[0].sequence_number</code>Applying the earlier seven‑transaction example yields the same parallel execution pattern described for the Commit‑Parent scheme, but the Lock‑Based scheme achieves higher overall parallelism.
4. WRITESET Scheme
Introduced in MySQL 8.0, the WRITESET scheme is primarily used by Group Replication for conflict detection during the certification phase. Two concurrent transactions from different nodes are considered non‑conflicting if they do not modify the same row.
4.1 Generating the Writeset
Extract primary‑key, unique‑index, and foreign‑key information for each modified row and concatenate them into a string.
Hash the string using the algorithm defined by
transaction_write_set_extraction(default
XXHASH64).
Insert the hash value into the transaction’s writeset.
4.2 Implementation Details
<code>void Writeset_trx_dependency_tracker::get_dependency(THD *thd,
int64 &sequence_number,
int64 &commit_parent) {
Rpl_transaction_write_set_ctx *write_set_ctx =
thd->get_transaction()->get_transaction_write_set_ctx();
std::vector<uint64> *writeset = write_set_ctx->get_write_set();
// ... (logic to decide whether WRITESET can be used, update m_writeset_history, etc.)
}
</code>The function determines whether a transaction can use WRITESET based on factors such as writeset size, matching
transaction_write_set_extractionsettings, foreign‑key relationships, and history‑size limits.
If WRITESET cannot be used, the transaction falls back to the Lock‑Based scheme.
4.3 Relevant Parameters
binlog_transaction_dependency_tracking – selects the dependency‑tracking scheme (COMMIT_ORDER, WRITESET, WRITESET_SESSION).
transaction_write_set_extraction – hash algorithm for writeset (OFF, MURMUR32, XXHASH64).
binlog_transaction_dependency_history_size – maximum number of entries stored in the writeset history (default 25000).
5. Benchmark Results
MySQL’s official benchmarks compare COMMIT_ORDER, WRITESET_SESSION, and WRITESET under three workloads (OLTP read/write, indexed‑column update, write‑only) on a 16‑core SSD master with 8 M rows across 16 tables.
Key findings:
COMMIT_ORDER benefits from higher master concurrency; replication speed increases with more threads.
WRITESET’s performance is largely independent of master concurrency; even a single thread outperforms COMMIT_ORDER with 256 threads.
WRITESET_SESSION behaves like COMMIT_ORDER but still achieves good throughput at lower thread counts (4–8).
6. Enabling Parallel Replication
On the replica, set the following parameters (requires a replication restart):
<code>slave_parallel_type = LOGICAL_CLOCK</code>
<code>slave_parallel_workers = 16</code>
<code>slave_preserve_commit_order = ON</code>To use the WRITESET scheme on the master, configure:
<code>binlog_transaction_dependency_tracking = WRITESET_SESSION</code>
<code>transaction_write_set_extraction = XXHASH64</code>
<code>binlog_transaction_dependency_history_size = 25000</code>
<code>binlog_format = ROW</code>WRITESET works only when the binlog format is ROW.
7. References
WL#6314: MTS – Prepared transactions slave parallel applier
WL#6813: MTS – ordered commits (sequential consistency)
WL#7165: MTS – Optimizing MTS scheduling by increasing the parallelization window on master
WL#8440: Group Replication – Parallel applier support
WL#9556: Writeset‑based MTS dependency tracking on master
WriteSet parallel replication (Chinese article)
Improving the Parallel Applier with Writeset‑based Dependency Tracking
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.