Understanding InnoDB Savepoint Implementation: Undo Log Sequence, Savepoint Structure, and Management
This article explains how InnoDB implements savepoints in MySQL 8.0.32, covering undo log sequencing, the savepoint structures in the server, binlog, and InnoDB layers, the process of locating and deleting duplicate savepoints, and the three-step procedure for creating a new savepoint.
This article, based on MySQL 8.0.32 source code, examines the internal implementation of InnoDB savepoints, describing how undo logs, savepoint structures, and binlog offsets are managed across the server, binlog, and InnoDB layers.
1. Undo Log Sequence InnoDB transaction objects contain an undo_no attribute. Each data‑modifying operation generates an undo log entry whose sequence number is taken from the current undo_no . The sequence starts at 0 for each transaction and increments by one for each subsequent undo record. The savepoint structure records the undo_no value at the moment the savepoint is created.
2. Savepoint Structure When a SAVEPOINT statement is issued, three structures are created:
Server‑side SAVEPOINT object with prev , name , and mdl_savepoint fields.
Binlog savepoint, a simple 8‑byte integer representing the current binlog offset (the number of bytes written to the binlog so far).
InnoDB trx_named_savept_t object containing name , a trx_savept_t sub‑object (which stores the undo_no value), and a linked list trx_savepoints of all savepoints.
The server allocates a 96‑byte memory block for each savepoint: 48 bytes for the SAVEPOINT object, 40 bytes for the InnoDB trx_named_savept_t , and 8 bytes for the binlog offset.
3. Finding a Savepoint with the Same Name Each client thread maintains an m_savepoints linked list. When creating a new savepoint, the server scans this list from newest to oldest to detect any existing savepoint with the same name.
4. Deleting a Duplicate Savepoint If a duplicate is found, the server removes the corresponding SAVEPOINT object from m_savepoints . The binlog offset stored in the server‑side memory does not require special handling, but InnoDB must locate and delete the matching trx_named_savept_t from its own trx_savepoints chain.
5. Saving a New Savepoint (Three Steps)
Binlog generates a Query_log_event such as SAVEPOINT `test_savept` and writes it to trx_cache , updating the binlog offset.
InnoDB creates a trx_named_savept_t object, appends it to the tail of the transaction’s trx_savepoints list, and stores the current undo_no in its savept sub‑object.
The server‑side SAVEPOINT object is linked to the tail of the thread’s m_savepoints list.
6. Summary
The server creates a SAVEPOINT object to hold savepoint metadata.
The binlog offset (8 bytes) is written into the server‑allocated memory.
InnoDB maintains its own linked list of trx_named_savept_t objects.
If a duplicate name exists, the server removes the old savepoint from both the server list and InnoDB’s list before creating the new one.
The newly created SAVEPOINT is appended to m_savepoints , and the corresponding InnoDB object is appended to trx_savepoints .
Future topics will explore why the SAVEPOINT statement is written to the binlog and how rollback to a savepoint is performed.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.