How HVO Cuts HugeTLB Memory Overhead by Up to 99% in Linux Kernels
This article, based on ByteDance STE’s 2022 Linux Kernel Developer Conference talk, explains the HVO (HugeTLB vmemmap Optimization) feature that dramatically reduces struct page memory for huge pages, details its latest enhancements, cross‑architecture support, usage steps, performance trade‑offs, and future plans.
Linux kernels manage physical pages in 4 KB units, allocating a struct page (~64 bytes) for each. Consequently, a 1 TB system reserves about 16 GB just for these structures. Huge pages (2 MB, 1 GB, etc.) still require a struct page per 4 KB sub‑page, wasting memory.
What is HVO?
HVO (HugeTLB vmemmap Optimization) maps all struct page virtual addresses of a huge page to a single physical address, freeing the memory occupied by the redundant pages. This not only cuts memory usage but also improves cache locality because multiple virtual accesses hit the same physical cache line.
Enabling HVO saves roughly 87.5 % of struct page memory for a 2 MB huge page and nearly 100 % for a 1 GB huge page.
2 MB huge page: originally 8 × 64 B = 32 KB; HVO releases 7 pages (28 KB) → 87 % reduction.
1 GB huge page: originally 262 144 × 64 B ≈ 16 MB; HVO keeps only the head page → ~100 % reduction.
Recent HVO Enhancements
Memory saving increased to >87 % after the “Free the 2nd vmemmap page” patch (12 % improvement).
ARM64 support added alongside existing x86 support.
Compatibility with memmap_on memory for hot‑plug scenarios.
Improved continuity of released memory to reduce fragmentation.
Extended to new use‑cases such as device‑dax .
How to Use HVO
Compile a kernel version ≥ 5.14 with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y . After boot, the kernel log shows the saved memory, e.g.:
<code>[ 1.047319] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[ 1.052204] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page</code>If the log shows “Not support”, the struct page alignment is insufficient for HVO.
Enabling HVO
Default‑on via kernel config: set CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y .
Command‑line parameter: hugetlb_free_vmemmap=on .
Runtime sysctl: echo 1 > /proc/sys/vm/hugetlb_optimize_vmemmap .
Balancing Space and Time
HVO incurs a slight performance penalty because it must allocate a huge page from the buddy allocator and remap the vmemmap. A practical approach is to pre‑allocate huge pages (saving space) and later disable HVO while adjusting /proc/sys/vm/nr_hugepages and /proc/sys/vm/nr_overcommit_hugepages for better runtime speed.
<code>echo 1 > /proc/sys/vm/hugetlb_optimize_vmemmap
echo $RESERVE > /proc/sys/vm/nr_hugepages
echo 0 > /proc/sys/vm/hugetlb_optimize_vmemmap
echo $OVERCOMMIT > /proc/sys/vm/nr_overcommit_hugepages</code>Extending HVO to Other Scenarios
The same optimization idea can be applied to device‑dax (persistent memory) and to the buddy allocator for large order blocks, where tail pages can be mapped to the head page to reduce memory overhead.
Future Plans
Fragmentation mitigation by reallocating a fresh page when freeing memory, allowing the formation of new contiguous huge pages.
Improved compatibility with hot‑plug memory, ensuring reclaimed HVO‑saved pages are properly handled.
ByteDance SYS Tech
Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.