Operations 6 min read

Why Do Your VMs Get OOM‑Killed? Uncovering Hidden Memory Overhead in KVM

This article investigates why virtual machines on an OpenStack IaaS platform experience OOM‑killer terminations despite reserved host memory, analyzes memory usage patterns of qemu‑kvm processes, and proposes practical solutions to mitigate unexpected OOM events.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Why Do Your VMs Get OOM‑Killed? Uncovering Hidden Memory Overhead in KVM

Introduction

The author, responsible for virtualization and container services on the 360 HULK cloud platform, explores the OOM Killer—a kernel feature that terminates a process when host memory is insufficient.

Problem Description

Scenario: OpenStack IaaS management platform, compute node running CentOS 7.2, QEMU, KVM with 128 GB RAM.

1 Problem Identification

VMs unexpectedly crash due to OOM despite no memory over‑commit and a 12 GB OS reservation (9.375% of total). Theoretical maximum memory usage should stay below 90.625%, yet OOM occurs.

2 Problem Investigation

Observations show OS services use far less than the reserved 12 GB. After restarting affected VMs, host memory usage remains around 4 GB, giving a theoretical usage of (128‑12+4)/128 ≈ 93.75%, still below OOM threshold. Thus, OS memory is not the cause.

Further analysis of qemu‑kvm process memory reveals a significant “RES” value exceeding the VM’s allocated memory. For a 4‑core 8 GB VM, actual usage is 8.3‑8.9 GB; for a 2‑core 4 GB VM, usage is 4.6‑4.8 GB. The excess memory is accounted for by the hypervisor process.

Research indicates that qemu‑kvm also allocates memory for virtual devices, which is counted toward the VM’s process memory.

3 Solution

Increase the OS reserved memory space to absorb the extra memory used by the VM.

Raise the swap size (currently 4 GB) on nodes with SSDs to provide a larger buffer during OOM situations.

Adjust OpenStack scheduling logic to reserve additional memory for VMs, though this is less universally applicable.

References

https://lime-technology.com/forums/topic/48093-kvm-memory-leakingoverhead/

https://unix.stackexchange.com/questions/140322/kvm-killed-by-oomkiller

Memory ManagementVirtualizationOOM KillerOpenStackKVM
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.