Meituan Java RASP: Architecture, Challenges, and Performance Optimizations

The article details Meituan's large‑scale RASP (Runtime Application Self‑Protection) rollout, covering deployment challenges such as heterogeneous Java environments, performance impact of agent injection, upgrade difficulties, monitoring complexities, and the engineering solutions—including agentmain/premain hybrid loading, gray‑scale upgrades, hot‑update plugin architecture, and performance optimizations—validated with concrete metrics and benchmarks.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Meituan Java RASP: Architecture, Challenges, and Performance Optimizations

Background

RASP (Runtime Application Self‑Protection) monitors and blocks attacks inside a running Java application by instrumenting bytecode at the JVM level. Meituan’s Java RASP detects command execution, file access, deserialization, JNDI, SQL injection and other threats.

Challenges of Large‑Scale Deployment

Java agents can be loaded either with agentmain (dynamic injection) or premain (JVM start‑time injection). Meituan’s services run on dozens of release platforms, making premain impractical for rapid coverage, so the team initially chose agentmain. The environment is highly heterogeneous:

Deployment forms: physical machines, hosts, rich containers, lightweight containers.

Release systems: at least three distinct pipelines.

Web middleware: Spring Boot, Jetty, Tomcat, WebLogic, custom frameworks.

JDK versions: OpenJDK, OracleJDK, MJDK, Kona, Dragonwell, BiSheng, etc.

Process count per host: from a few to thousands, with lifetimes ranging from minutes to years.

Figure 1 shows the JDK version distribution across Meituan’s fleet.

RASP JDK version distribution
RASP JDK version distribution

Performance Impact of Agent Injection

Agentmain injects RASP into the same JVM as the business logic, yielding the lowest isolation level (Figure 2). Bugs or incompatibilities can therefore affect the service directly. Two key metrics spike during injection:

cpu.busy can jump from 0 % to 50 % within a minute (Figure 3).

TP9999 (99.99th‑percentile latency) can rise from 5 ms to 1000 ms (Figure 4).

cpu.busy spike
cpu.busy spike
TP9999 spike
TP9999 spike

Upgrade Difficulty

A Java Agent cannot be reloaded without restarting the JVM; any RASP version upgrade therefore requires a full service restart. Long‑running services (e.g., core big‑data nodes) may stay up for months, making timely upgrades impossible. Figure 5 illustrates the native Java Agent lifecycle and why a new JAR cannot replace the old one without a JVM restart.

Java Agent upgrade process
Java Agent upgrade process

Monitoring Complexity

When RASP is loaded, it can cause CPU spikes, latency spikes, memory leaks, core dumps, JDK bugs, thread deadlocks and GC pauses. Figure 6 shows the fault‑type distribution observed in production.

RASP fault distribution
RASP fault distribution

RASP Architecture

Meituan’s RASP consists of a Golang daemon, a Java agent that uses ASM to rewrite bytecode, and a plugin JAR that contains detection logic. Configuration is pushed from a central management console (Figure 7).

RASP configuration distribution
RASP configuration distribution

Hybrid Loading Strategy (agentmain + premain)

Agentmain provides fast, no‑restart deployment but incurs CPU jitter. Premain offers stable performance but requires JVM start‑time parameters. Meituan combined both: a lightweight premain JAR (only enabling the Can‑Redefine‑Classes flag) is loaded at start‑up to eliminate the initial CPU surge, while the full agentmain injection runs later to activate detection.

Manifest-Version: 1.0
Premain-Class: com.meituan.rasp.agent.RaspAgent
Agent-Class: com.meituan.rasp.agent.RaspAgent
Can-Redefine-Classes: true
Can-Retransform-Classes: true
Can-Set-Native-Method-Prefix: true

Figure 10 compares cpu.busy before and after applying the empty premain package; the spike disappears.

cpu.busy before/after
cpu.busy before/after

Premain Phase‑2: Bytecode Modification Impact

Even with a stable premain, modifying bytecode at runtime forces the JVM into a safepoint (stop‑the‑world). The flame graph in Figure 11 shows that VM_RedefineClasses::doit dominates the pause time. Experiments (Figure 12) reveal that STW duration grows with the number of classes transformed and, to a lesser extent, with request QPS.

Flame graph of class redefinition
Flame graph of class redefinition
STW vs QPS
STW vs QPS

To mitigate the impact, Meituan moves the bulk of bytecode transformation to the JVM start‑up phase (when traffic is absent) and avoids large‑scale transformations during peak load.

Hot‑Update Plugin Design

RASP splits its code into two parts: a stable agent that only loads a minimal bootstrap, and a frequently updated plugin JAR. The plugin is loaded by a custom RaspClassLoader. When a new version arrives, the old plugin instance and its class loader are discarded, and a fresh class loader loads the new JAR (Figures 8‑9).

Plugin split before/after
Plugin split before/after
RASP classloader hierarchy
RASP classloader hierarchy

The JAR composition is:

rasp‑boot.jar : global variables, loaded by BootstrapClassLoader.

rasp‑agent.jar : entry point, loaded by AppClassLoader.

rasp‑plugin.jar : core detection logic, loaded by RaspClassLoader.

Script.class : script engine, loaded by ScriptClassLoader.

Runtime Performance Optimizations

Key techniques include:

Traffic warm‑up: initial 1 % traffic to pre‑heat detection logic.

Soft‑degrade: disable high‑cost hooks when CPU or latency exceeds thresholds, falling back to observation mode.

No bytecode rollback on unload: keep transformed classes but disable their hooks to avoid STW during plugin removal.

Global monitoring tracks host injection coverage, core‑dump count, high‑peak bytecode modifications, and circuit‑breaker timeouts (Figure 15). JVM‑level metrics such as Metaspace usage are collected; if free Metaspace falls below a threshold, RASP injection is blocked to prevent out‑of‑memory crashes.

Monitoring metric overview
Monitoring metric overview

Performance Benchmarks

Under an 8‑core/8 GB testbed with QPS ranging from 20 to 500, CPU overhead increased by only 0.71 % to 3.73 % (Table 1). Heap memory grew by ~180 MB on average, and Metaspace by ~2 MB, both comparable to open‑source RASP solutions. Latency remained under 5 ms for the majority of requests.

CPU overhead by QPS:

QPS 20: Δ 0.71 %

QPS 100: Δ 0.06 %

QPS 200: Δ 0.15 %

QPS 300: Δ 0.66 %

QPS 400: Δ 2.97 %

QPS 500: Δ 3.73 %

Memory usage increase at peak QPS is about 200 MB, confirming low overhead.

Supported Vulnerability Types & Engine Choice

Meituan RASP detects command execution (including native methods), SQL injection, file access, deserialization, JNDI, expression injection, etc. A performance comparison between a Java‑based detection engine and a JavaScript‑based one shows Java achieving ~6 ms average for a 100 k loop versus 585 ms for JavaScript, justifying the Java implementation.

Java vs JavaScript performance
Java vs JavaScript performance

Detection Script Lifecycle & Blocking Flow

Scripts implement a defined interface; the agent periodically checks for updates, loads new classes via a dedicated class loader, and instantiates them. Bytecode weaving inserts hooks at method before , return , and throw points (Figure 16). When a hook returns a block exception, execution stops; when it returns a replacement object, the method returns the patched value; otherwise execution proceeds unchanged.

RASP block and hot‑fix flow
RASP block and hot‑fix flow

Conclusion

After years of iteration, Meituan RASP now covers the vast majority of Java services, supports multiple web containers, and detects a wide range of vulnerabilities with minimal performance impact. Ongoing plans include supporting newer container forms, further reducing injection overhead, and automating configuration distribution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InstrumentationPerformance OptimizationJava AgentHot UpdateGray DeploymentRASPRuntime Application Self-Protection
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.