Guest PEBS: Precise Event-Based Sampling for KVM Virtualization
Guest PEBS, a virtualization‑aware implementation of Intel’s Precise Event‑Based Sampling integrated into KVM by Tencent Cloud, eliminates interrupt overhead and instruction‑skid, delivering low‑latency, fine‑grained hardware performance data inside virtual machines, enabling accurate profiling, latency reduction, and cost‑effective optimization for cloud workloads.
At the recent KVM Forum, the 2022 Global Enterprise KVM Open‑Source Contribution Ranking was released. Tencent Cloud ranked first among Chinese enterprises, and its self‑developed precise event sampling (PEBS) technology was recognized as a core breakthrough for KVM.
What is Guest PEBS? Guest PEBS (Precise Event‑Based Sampling) is a virtualization‑aware implementation of Intel's PEBS that accurately matches hardware performance events with software code inside a virtual machine. It turns CPU execution data from a “black box” into a “white box”, allowing developers to observe CPU state and performance metrics in real time.
The technique addresses the long‑standing problem of inaccurate correlation between hardware performance events and software code in virtualized environments. By minimizing instruction‑skid (the distance between the event‑triggering instruction and the recorded instruction pointer) and eliminating frequent interrupt overhead, Guest PEBS provides fine‑grained, low‑overhead sampling.
Implementation Details
Traditional sampling relies on performance‑counter overflow interrupts, which introduce latency and instruction‑skid. Guest PEBS replaces the interrupt with micro‑code that writes a PEBS record (including registers, EIP, timestamps, etc.) directly into a pre‑allocated buffer. Multiple PEBS records can be stored before a hardware performance‑monitoring interrupt is generated, dramatically reducing interrupt frequency.
The advantages are twofold: (1) reduced overhead – only one interrupt is generated when the PEBS buffer is near full, and (2) much smaller time gap between counter overflow and context capture, minimizing instruction‑skid.
Although Intel claims the overhead is negligible, measurements show each PEBS sample adds about 200 ns of CPU cost and may cause predictable cache pollution.
Cloud‑Level Precise Sampling
While Linux has long supported PEBS on bare metal, exposing it to virtual machines required solving several challenges: distinguishing PEBS buffer addresses between host and guest, safely exposing a PEBS device model, and allowing VMs to dynamically acquire hardware performance‑monitoring resources without compromising host services.
Tencent Cloud’s kernel developers iteratively addressed these issues, integrating PEBS support into KVM for both Intel and AMD platforms. The solution enables cloud developers to collect accurate hardware performance data (e.g., memory bandwidth, cache usage, branch prediction success) from within VMs. One practical use case is tracing packet‑processing latency in DPDK‑based forwarders, where PEBS helped identify unexpected delays.
Guest PEBS has been upstreamed and adopted by other vendors, strengthening Tencent Cloud’s performance‑analysis stack. It empowers customers to perform code‑level hardware profiling, optimize hot paths, reduce latency tails, and lower overall compute costs.
In addition to PEBS, the team contributed several enhancements to the io_uring ecosystem, such as multi‑connection acceptance and thread‑affinity optimizations, further improving asynchronous I/O performance in cloud workloads.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.