Databases 8 min read

InnoDB Index Physical Space Usage Statistics and Page Scan Optimization

The article explains how CDB introduced an InnoDB index physical space usage statistic to help users decide when to run OPTIMIZE TABLE, describes the background of page fragmentation, compares incremental and full‑scan methods, and details a sequential I/O optimization that dramatically reduces measurement time.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
InnoDB Index Physical Space Usage Statistics and Page Scan Optimization

Introduction: To help users decide when to run OPTIMIZE TABLE, CDB developed a feature to report InnoDB index physical space usage.

Background

InnoDB stores rows; deletions are marked and reclaimed asynchronously. Pages are typically 16 KB. When a large proportion of records are deleted (>40 %), page space utilization drops, causing file bloat, scattered I/O and degraded performance. Users currently must rebuild tables manually, which is time‑consuming.

Page Space Statistics Method

Each InnoDB B‑tree corresponds to an index, so index space statistics equal B‑tree space statistics. A page consists of allocated header, unused area, and FIL_TRAILER. The actual used space is calculated as USED = ALLOCED – DELETED + FIL_TRAILER.

Design Discussion

Two approaches were considered: incremental and full‑scan statistics. Incremental updates give real‑time values but add overhead to normal DB operations and require re‑initialization after restart. Full‑scan can be costly for large datasets.

Full scan of all pages – high I/O consumption.

Persisting statistics – compatibility issues with external MySQL replicas.

Given the drawbacks of incremental collection, the final design adopts a full‑scan that traverses B‑tree layers from left to right, accumulating page data and recording delete‑mark ratios.

Page Scan Optimization

4.1 Scan Strategy

B‑tree pages are logically adjacent but not physically; naïve layer‑by‑layer scanning results in random I/O. By identifying the set of pages belonging to a B‑tree, sorting their page numbers, and accessing them sequentially, random I/O can be turned into sequential I/O.

InnoDB’s physical layout consists of tablespaces, extents, segments and pages. Each B‑tree uses two segments (leaf and non‑leaf). Segments contain 32 fragment pages plus three extent lists, enabling the collection of all relevant pages via the root page.

4.2 Additional Benefits

The sequential‑I/O approach reduces lock contention: only the current page’s read lock is needed, eliminating the need for index locks and adjacent‑page locks.

4.3 Optimization Results

On a 48‑core test machine with a 96 GB table and a 64 GB buffer pool, the optimized scan reduced the time for physical‑space statistics from 220 s to 93 s, demonstrating the efficiency gain.

Final Outcome

The index physical‑space usage statistics feature is now available in the latest CDB version, supporting InnoDB regular, compressed and partitioned tables. For compressed tables, statistics are reported on the uncompressed size; for partitioned tables, results are weighted across partitions.

Conclusion

With this feature, DBAs no longer have to guess when to run OPTIMIZE TABLE, simplifying maintenance and improving storage efficiency.

InnoDBDatabase Performanceindex space usageOPTIMIZE TABLEpage scan optimization
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.