Cloud Native 10 min read

How TencentOS “Ruyi” Solves Page‑Cache Overuse in Container Environments

This article explains the challenges of uncontrolled page‑cache growth in containerized workloads, reviews community attempts to limit it, and details TencentOS “Ruyi” memory‑QoS solutions—including cgroup‑level page‑cache limits, implementation details, and observed performance effects.

Tencent Architect

Sep 29, 2021

How TencentOS “Ruyi” Solves Page‑Cache Overuse in Container Environments

Introduction

TencentOS “Ruyi” is an OS‑side resource isolation solution targeting large‑scale container clusters. It provides QoS for CPU, I/O, memory, and network to improve resource utilization and reduce server costs in mixed online/offline workloads.

Background of Memory Isolation

In container environments, each container has a memory quota, but the Linux page cache can grow without bound, consuming free memory and causing delays in memory allocation for business workloads. Limiting page‑cache usage is therefore critical.

Community Solutions

Various community patches have attempted to limit page cache, such as restricting the proportion of page cache memory ( LWN article ) and limiting negative dentry memory ( LKML discussion ). However, many proposals were not merged due to concerns about kernel complexity.

"Ruyi" Memory QoS Design

TencentOS “Ruyi” explores several approaches to control container page‑cache usage.

2.2.1. Approach 1

Implement cgroup‑level dirty_background_ratio / dirty_ratio. This was not adopted because it interferes with I/O QoS and does not cover non‑dirty page cache.

2.2.2. Approach 2

Extend the existing global page‑cache limit to support cgroup‑level limits. A per‑cgroup page counter tracks page‑cache usage; when a new allocation would exceed the limit, the system attempts configurable reclamation before allowing the allocation.

Non‑direct I/O reads first check the page cache via pagecache_get_page. If the cgroup’s page‑cache quota is exceeded, reclamation is triggered; if reclamation fails after configured retries, the process is OOM‑killed.

Users can view current page‑cache usage and statistics via memory.events and control behavior with sysctl parameters such as vm.pagecache_limit_global, vm.pagecache_limit_ignore_dirty, and vm.pagecache_limit_ignore_slab, which work for both cgroup v1 and v2.

Implementation Details

Images illustrate the architecture and metrics:

Results

Without limits, page‑cache memory continuously grows until it exhausts RAM. With the cgroup‑level limit enabled, page‑cache usage stabilizes at a configurable threshold, preventing OOM situations while still allowing reasonable file I/O performance.

Open Issues

Enabling page‑cache limits trades off some I/O throughput for more predictable memory availability. Users must balance these factors based on workload characteristics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Container cgroup page cache tencentos Memory QoS

Written by

Tencent Architect

We share technical insights on storage, computing, and access, and explore industry-leading product technologies together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.