Databases 14 min read

Understanding InnoDB Change Buffer: Architecture, Operations, and Constraints

Change Buffer in MySQL InnoDB (formerly insert buffer) caches modifications to secondary index pages when they are not in the buffer pool, using a B‑tree structure to batch‑apply inserts, delete‑marks, and deletes while avoiding index page splits, merges, and ensuring purge safety.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Understanding InnoDB Change Buffer: Architecture, Operations, and Constraints

Change buffer (called insert buffer before MySQL 5.6) is an optimization introduced in InnoDB 5.5 that temporarily caches operations on secondary index pages that are not resident in the buffer pool, applying them in batches once the page is loaded, thereby reducing disk I/O.

Why only secondary index pages? Primary index pages must enforce uniqueness, so caching an insert that would violate uniqueness could cause errors during batch apply. Change buffer therefore only handles secondary index pages.

The buffer stores three types of operations:

BTR_INSERT_OP – regular insert

BTR_DELMARK_OP – delete‑mark applied during update/delete

BTR_DELETE_OP – purge thread deletes rows marked for deletion

Organization (B‑tree)

Change buffer is a write‑cache organized as a B‑tree stored in the system tablespace; its root resides on page 4 (FSP_IBUF_TREE_ROOT_PAGE_NO).

ibuf entry layout

Each cached operation is an entry with the following fields:

IBUF_REC_FIELD_SPACE – space id of the secondary index page

IBUF_REC_FIELD_MARKER – version marker (default 0)

IBUF_REC_FIELD_PAGE_NO – page number of the secondary index page

IBUF_REC_OFFSET_COUNTER – per‑page incremental counter (not strictly monotonic)

IBUF_REC_OFFSET_TYPE – operation type (IBUF_OP_INSERT / IBUF_OP_DELETE_MARK / IBUF_OP_DELETE)

IBUF_REC_OFFSET_FLAGS – record format (REDUNDANT / COMPACT)

IBUF_REC_FIELD_USER – user record data

The entry counter, together with space_id and page_no , forms the primary key of an ibuf entry. When inserting a new entry, the system searches for the maximum existing counter for the same page (using a PAGE_CUR_LE search) and uses max_counter + 1 as the new counter. After a merge clears all entries for a page, the counter restarts from zero.

Constraints: No index SMO

An operation that could cause a secondary‑index page structural modification (SMO) – either a page split or a page merge (when only one record remains) – must not be cached. If an entry would trigger such a condition, the change buffer abandons further caching for that page.

Tracking free space with ibuf bitmap pages

Each bitmap page describes four bits per index page:

IBUF_BITMAP_FREE (2 bits) – free‑space range

IBUF_BITMAP_BUFFERED (1 bit) – whether the page has buffered operations

IBUF_BITMAP_IBUF (1 bit) – whether the page is a node of the ibuf B‑tree

The free‑space bits are calculated by the function ibuf_index_page_calc_free_bits :

UNIV_INLINE
ulint ibuf_index_page_calc_free_bits(ulint page_size, ulint max_ins_size) {
// max_ins_size is the remaining free space on the page; IBUF_PAGE_SIZE_PER_FREE_SPACE is 32
// page_size / IBUF_PAGE_SIZE_PER_FREE_SPACE = 512 bytes, a coarse estimate
// max_ins_size / 512 determines the free‑space level:
//   > 3 (max_ins_size > 2048) => record 3
//   3 (1536 < max_ins_size < 2048) => record 3
//   2 (1024 < max_ins_size < 1536) => record 2
//   1 (512  < max_ins_size < 1024) => record 1
//   0 (max_ins_size < 512) => record 0
n = max_ins_size / (page_size / IBUF_PAGE_SIZE_PER_FREE_SPACE);
if (n == 3) { n = 2; }
if (n > 3) { n = 3; }
return (n);
}

After each insert/update/delete, the bitmap is updated via ibuf_update_free_bits_if_full or ibuf_update_free_bits_low .

Preventing page splits

When buffering IBUF_OP_INSERT , InnoDB checks whether the accumulated buffered inserts would exceed the page’s remaining space (maximum 2048 bytes, as IBUF_BITMAP_FREE can be at most 3). The function ibuf_get_volume_buffered computes the total volume of buffered inserts for a given page and compares it with the free‑space bits.

ibuf_insert_low {
...
/* Find out the volume of already buffered inserts for the same index page */
min_n_recs = 0;
buffered = ibuf_get_volume_buffered(&pcur, page_id.space(), page_id.page_no(), op == IBUF_OP_DELETE? &min_n_recs: NULL, &mtr);
if (op == IBUF_OP_INSERT) {
ulint bits = ibuf_bitmap_page_get_bits(bitmap_page, page_id, page_size, IBUF_BITMAP_FREE, &bitmap_mtr);
if (buffered + entry_size + page_dir_calc_reserved_space(1) > ibuf_index_page_calc_free_from_bits(page_size, bits)) {
/* Not enough space – force a merge */
do_merge = TRUE;
ibuf_get_merge_page_nos(FALSE, btr_pcur_get_rec(&pcur), &mtr, space_ids, page_nos, &n_stored);
goto fail_exit;
}
}
}

Preventing page merges

During merge, InnoDB calculates how many records will remain on the page after applying buffered changes using ibuf_get_volume_buffered_count . It decrements the count for IBUF_OP_DELETE and increments for inserts, taking care to avoid double‑counting when an insert reuses a delete‑marked record.

static ulint ibuf_get_volume_buffered( ... ulint *n_recs, /*!< in/out: minimum number of records on the page after the buffered changes have been applied, or NULL to disable the counting */ )

If a buffered IBUF_OP_DELETE would leave fewer than two records on a page, the operation is not cached because it could empty the page.

Change Buffer write flow

Construct an ibuf entry with a temporary primary key (space id, page no, 0xFFFF).

Search the ibuf B‑tree in PAGE_CUR_LE mode to locate the entry with the largest counter for the target page; set the new counter to max_counter + 1 .

Update the entry’s IBUF_REC_OFFSET_COUNTER with the new counter.

Insert the entry into the ibuf B‑tree using the generic B‑tree API ( btr_cur_optimistic_insert or btr_cur_pessimistic_insert ).

Special handling of purge operations

Change buffer also caches purge‑thread deletions ( IBUF_OP_DELETE ). Before a purge thread caches an operation, it checks whether the secondary‑index record can be safely removed (old version not a delete‑mark, ROW_TRX_ID > purge view, and matching primary‑key/secondary‑key values). If a purge‑thread insert would conflict with a concurrent delete‑mark, the system aborts caching the insert to avoid losing the transaction’s effect.

The purge thread uses a watch array ( buf_pool->watch ) to track pages it accesses. When a page is found in the watch list, the change buffer skips caching further operations for that page.

bpage->state = BUF_BLOCK_ZIP_PAGE;
bpage->id = page_id;
bpage->buf_fix_count = 1;
ut_d(bpage->in_page_hash = TRUE);
HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, page_id.fold(), bpage);

These mechanisms ensure that change buffer improves write performance without compromising index integrity or purge correctness.

InnoDBMySQLB+ TreeDatabase InternalsChange Buffer
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.