Fundamentals 17 min read

Large Folios in the Linux Kernel: Benefits, Implementations, and Future Directions

Large folios in the Linux kernel combine multiple pages to reduce TLB misses, page faults, and reclamation cost while enabling more efficient compression; they are supported by filesystems like XFS and bcachefs, and recent patches add multi‑size THP, swap‑in/out handling, TAO allocation, NUMA balancing, and debug tools, with OPPO’s production deployment showing performance gains and motivating broader adoption and fragmentation mitigation.

OPPO Kernel Craftsman

Apr 19, 2024

Large Folios in the Linux Kernel: Benefits, Implementations, and Future Directions

In the Linux kernel, a folio can contain one or multiple pages; when it contains multiple pages it is called a large folio (or large page). Using large folios brings several benefits: reduced TLB misses (e.g., PMD mapping for 2 MiB or contiguous PTE mapping on ARM64), fewer page faults (e.g., do_anonymous_page can map a large folio and avoid faults on the remaining pages), lower LRU scale and reclamation cost (large folios are reclaimed as a unit, reducing reverse‑mapping overhead), and opportunities for larger‑granularity compression in zRAM/zsmalloc, which lowers CPU utilization and improves compression ratio.

File‑system support for large folios includes afs, bcachefs, erofs (non‑compressed), and xfs, which indicate their capability via mapping_set_large_folios() so the page cache can allocate large folios to fill the xarray when mapping_large_folio_support() returns true.

For anonymous pages, several patch series have been contributed: Ryan Roberts (ARM) introduced multi‑size THP (mTHP) allowing allocation of various sized large folios on fault, Transparent Contiguous PTEs for ARM64 to let 16 contiguous PTEs use a single TLB entry via the CONT bit, a swap‑out mTHP patch that avoids splitting large folios during reclaim (unless already partially unmapped), and a swap‑in large folio patch from OPPO (Chuanhua Han, Barry Song) that enables direct large‑folio swap‑in to preserve mTHP benefits on swap‑heavy Android/embedded workloads.

Additional works cover mTHP‑friendly compression in zsmalloc/zram (Tangquan Zheng), the TAO allocator optimization (Yu Zhao) that abstracts memory into 4 KB and large‑folio zones to improve allocation and compaction, per‑order mTHP allocation and swap‑out counters (Barry Song), a debugfs interface to split a folio to any lower order (Zi Yan), NUMA‑balancing support for multi‑size THP (Baolin Wang), and an enhancement to MADV_FREE/lazyfreeing that avoids splitting folios (Lance Yang).

The article also notes OPPO’s deployment of dynamic large pages (mainly 64 KiB leveraging CONT‑PTE) in production kernels since 2023, showing performance and user‑experience gains, and outlines future directions: broader file‑system support, reliable allocation guarantees similar to TAO, mainline swap‑in support, hardware‑offload compression, zswap large‑folio support, swap‑fragmentation solutions, balancing performance gains against memory fragmentation, and handling user‑space partial unmapping of large folios.

Code example showing the new per‑order mTHP stats files:

anon_alloc

anon_alloc_fallback

anon_swpout

anon_swpout_fallback

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NUMA TLB large folios mTHP Swap zRAM

Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.