Backend Development 23 min read

Performance Optimization Practices for Ctrip's CAT Monitoring System

This article details Ctrip's implementation and performance optimization of the CAT application monitoring system, covering its deployment, thread‑model redesign, client‑side computation offloading, double‑buffered reporting, and string handling improvements that reduced CPU usage and GC pressure.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Performance Optimization Practices for Ctrip's CAT Monitoring System

CAT, an open‑source trace‑based monitoring system originally from Dianping, was introduced at Ctrip in 2014 and now handles over 70,000 clients, processing more than 8,000 billion messages and 900 TB of monitoring traffic per day.

The rapid growth caused CPU saturation and frequent GC pauses due to the CPU‑intensive real‑time report calculations and the large amount of in‑memory data.

Case 1 – Thread Model Optimization : The original BIO‑like model created a thread per queue, leading to massive context switches and lock contention. By decoupling queues from threads and introducing a selector‑based NIO‑style model with a shared thread pool and priority scheduling, the system reduced thread count, lowered CPU usage to about 70% of the original, and eliminated unnecessary context switches.

Case 2 – Client‑Side Computation : Heavy Transaction and Event report calculations were moved to the client. Clients aggregate statistics locally and send pre‑computed results, cutting server‑side CPU consumption from 7.5 cores to negligible amounts and reducing memory usage to under 10 MB per client.

Case 3 – Double‑Buffered Reporting : To avoid hourly report creation overhead and GC churn, two reports are pre‑allocated and rotated, keeping them in the Old generation and adding periodic cleanup, which dropped Young GC frequency by 40% and Full GC from ~20 to 3 runs per day.

Case 4 – String Handling Optimization : Repeated creation of String objects for fields like type, name, status, and data caused significant memory and CPU overhead. By using a BytesWrapper and a custom BytesHashMap to compare raw byte streams directly, the system eliminated most string allocations, further reducing Young GC by 40%.

The overall lessons emphasize minimizing object allocation, reducing lock contention, leveraging client‑side processing, and reusing memory structures to achieve substantial CPU and GC improvements in large‑scale monitoring systems.

JavamonitoringPerformance OptimizationCATGCThread model
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.