Meituan Java RASP: Architecture, Challenges, and Performance Optimizations
The article details Meituan's large‑scale RASP (Runtime Application Self‑Protection) rollout, covering deployment challenges such as heterogeneous Java environments, performance impact of agent injection, upgrade difficulties, monitoring complexities, and the engineering solutions—including agentmain/premain hybrid loading, gray‑scale upgrades, hot‑update plugin architecture, and performance optimizations—validated with concrete metrics and benchmarks.
Background
RASP (Runtime Application Self‑Protection) monitors and blocks attacks inside a running Java application by instrumenting bytecode at the JVM level. Meituan’s Java RASP detects command execution, file access, deserialization, JNDI, SQL injection and other threats.
Challenges of Large‑Scale Deployment
Java agents can be loaded either with agentmain (dynamic injection) or premain (JVM start‑time injection). Meituan’s services run on dozens of release platforms, making premain impractical for rapid coverage, so the team initially chose agentmain. The environment is highly heterogeneous:
Deployment forms: physical machines, hosts, rich containers, lightweight containers.
Release systems: at least three distinct pipelines.
Web middleware: Spring Boot, Jetty, Tomcat, WebLogic, custom frameworks.
JDK versions: OpenJDK, OracleJDK, MJDK, Kona, Dragonwell, BiSheng, etc.
Process count per host: from a few to thousands, with lifetimes ranging from minutes to years.
Figure 1 shows the JDK version distribution across Meituan’s fleet.
Performance Impact of Agent Injection
Agentmain injects RASP into the same JVM as the business logic, yielding the lowest isolation level (Figure 2). Bugs or incompatibilities can therefore affect the service directly. Two key metrics spike during injection:
cpu.busy can jump from 0 % to 50 % within a minute (Figure 3).
TP9999 (99.99th‑percentile latency) can rise from 5 ms to 1000 ms (Figure 4).
Upgrade Difficulty
A Java Agent cannot be reloaded without restarting the JVM; any RASP version upgrade therefore requires a full service restart. Long‑running services (e.g., core big‑data nodes) may stay up for months, making timely upgrades impossible. Figure 5 illustrates the native Java Agent lifecycle and why a new JAR cannot replace the old one without a JVM restart.
Monitoring Complexity
When RASP is loaded, it can cause CPU spikes, latency spikes, memory leaks, core dumps, JDK bugs, thread deadlocks and GC pauses. Figure 6 shows the fault‑type distribution observed in production.
RASP Architecture
Meituan’s RASP consists of a Golang daemon, a Java agent that uses ASM to rewrite bytecode, and a plugin JAR that contains detection logic. Configuration is pushed from a central management console (Figure 7).
Hybrid Loading Strategy (agentmain + premain)
Agentmain provides fast, no‑restart deployment but incurs CPU jitter. Premain offers stable performance but requires JVM start‑time parameters. Meituan combined both: a lightweight premain JAR (only enabling the Can‑Redefine‑Classes flag) is loaded at start‑up to eliminate the initial CPU surge, while the full agentmain injection runs later to activate detection.
Manifest-Version: 1.0
Premain-Class: com.meituan.rasp.agent.RaspAgent
Agent-Class: com.meituan.rasp.agent.RaspAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true
Can-Set-Native-Method-Prefix: trueFigure 10 compares cpu.busy before and after applying the empty premain package; the spike disappears.
Premain Phase‑2: Bytecode Modification Impact
Even with a stable premain, modifying bytecode at runtime forces the JVM into a safepoint (stop‑the‑world). The flame graph in Figure 11 shows that VM_RedefineClasses::doit dominates the pause time. Experiments (Figure 12) reveal that STW duration grows with the number of classes transformed and, to a lesser extent, with request QPS.
To mitigate the impact, Meituan moves the bulk of bytecode transformation to the JVM start‑up phase (when traffic is absent) and avoids large‑scale transformations during peak load.
Hot‑Update Plugin Design
RASP splits its code into two parts: a stable agent that only loads a minimal bootstrap, and a frequently updated plugin JAR. The plugin is loaded by a custom RaspClassLoader. When a new version arrives, the old plugin instance and its class loader are discarded, and a fresh class loader loads the new JAR (Figures 8‑9).
The JAR composition is:
rasp‑boot.jar : global variables, loaded by BootstrapClassLoader.
rasp‑agent.jar : entry point, loaded by AppClassLoader.
rasp‑plugin.jar : core detection logic, loaded by RaspClassLoader.
Script.class : script engine, loaded by ScriptClassLoader.
Runtime Performance Optimizations
Key techniques include:
Traffic warm‑up: initial 1 % traffic to pre‑heat detection logic.
Soft‑degrade: disable high‑cost hooks when CPU or latency exceeds thresholds, falling back to observation mode.
No bytecode rollback on unload: keep transformed classes but disable their hooks to avoid STW during plugin removal.
Global monitoring tracks host injection coverage, core‑dump count, high‑peak bytecode modifications, and circuit‑breaker timeouts (Figure 15). JVM‑level metrics such as Metaspace usage are collected; if free Metaspace falls below a threshold, RASP injection is blocked to prevent out‑of‑memory crashes.
Performance Benchmarks
Under an 8‑core/8 GB testbed with QPS ranging from 20 to 500, CPU overhead increased by only 0.71 % to 3.73 % (Table 1). Heap memory grew by ~180 MB on average, and Metaspace by ~2 MB, both comparable to open‑source RASP solutions. Latency remained under 5 ms for the majority of requests.
CPU overhead by QPS:
QPS 20: Δ 0.71 %
QPS 100: Δ 0.06 %
QPS 200: Δ 0.15 %
QPS 300: Δ 0.66 %
QPS 400: Δ 2.97 %
QPS 500: Δ 3.73 %
Memory usage increase at peak QPS is about 200 MB, confirming low overhead.
Supported Vulnerability Types & Engine Choice
Meituan RASP detects command execution (including native methods), SQL injection, file access, deserialization, JNDI, expression injection, etc. A performance comparison between a Java‑based detection engine and a JavaScript‑based one shows Java achieving ~6 ms average for a 100 k loop versus 585 ms for JavaScript, justifying the Java implementation.
Detection Script Lifecycle & Blocking Flow
Scripts implement a defined interface; the agent periodically checks for updates, loads new classes via a dedicated class loader, and instantiates them. Bytecode weaving inserts hooks at method before , return , and throw points (Figure 16). When a hook returns a block exception, execution stops; when it returns a replacement object, the method returns the patched value; otherwise execution proceeds unchanged.
Conclusion
After years of iteration, Meituan RASP now covers the vast majority of Java services, supports multiple web containers, and detects a wide range of vulnerabilities with minimal performance impact. Ongoing plans include supporting newer container forms, further reducing injection overhead, and automating configuration distribution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
