Backend Development 15 min read

Performance Optimization Practices in Vivo Push Recommendation Service

The Vivo Push recommendation service was dramatically accelerated by replacing regex‑based String.split with a custom utility, converting massive String‑key maps to long keys, and moving large caches off‑heap, which together cut latency by about 32%, boosted throughput roughly 45% and nearly doubled overall performance.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Performance Optimization Practices in Vivo Push Recommendation Service

This article presents a comprehensive performance tuning case study of the Vivo Push recommendation service, a CPU‑intensive Java backend that processes user events from Kafka and selects articles for push notifications.

Background : As business grew, request concurrency and model computation increased, causing severe Kafka consumption backlog and overall latency. The team needed a systematic performance improvement.

Optimization Metrics and Approach : The primary metric is throughput (TPS), defined as TPS = concurrency / average response time (RT). Since CPU utilization was already >80%, the focus shifted to reducing RT.

The optimization was divided into three main parts: hotspot code optimization, map lookup optimization, and JVM GC optimization.

1. Hotspot Code Optimization

The team identified that java.lang.String.split consumed ~13% of CPU time. Three issues were found: (1) split uses regular expressions even for simple delimiters, (2) internal conversions add overhead, and (3) many calls only need the first token.

To address these, a custom utility SplitUtils was created, providing splitFirst and split methods that avoid regex, return a List<String> directly, and support a limit parameter.

import java.util.ArrayList; import java.util.List; import org.apache.commons.lang3.StringUtils; /** * Custom split utility */ public class SplitUtils { /** * Return the first token after splitting */ public static String splitFirst(final String str, final String delim) { if (null == str || StringUtils.isEmpty(delim)) { return str; } int index = str.indexOf(delim); if (index < 0) { return str; } if (index == 0) { // delimiter at start return ""; } return str.substring(0, index); } /** * Return all tokens as a List */ public static List split(String str, final String delim) { if (null == str) { return new ArrayList<>(0); } if (StringUtils.isEmpty(delim)) { List result = new ArrayList<>(1); result.add(str); return result; } final List stringList = new ArrayList<>(); while (true) { int index = str.indexOf(delim); if (index < 0) { stringList.add(str); break; } stringList.add(str.substring(0, index)); str = str.substring(index + delim.length()); } return stringList; } }

Micro‑benchmarks using JMH showed the custom implementation improved throughput by ~50% for single‑character delimiters and maintained performance for multi‑character delimiters, while also providing a 2× speedup when only the first token was needed.

2. Map Lookup Optimization

Profiling revealed HashMap.getOrDefault consumed ~20% of CPU time due to massive weight maps (10+ million entries) and long string keys (average length >20). The team switched keys from String to long using a custom hash, reducing collision probability and memory footprint.

They also examined the String.equals implementation:

public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String) anObject; int n = value.length; if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; int i = 0; while (n-- != 0) { if (v1[i] != v2[i]) return false; i++; } return true; } } return false; }

After migrating keys to long , end‑to‑end latency dropped 20.67%, throughput rose 26.09%, and memory usage of the weight map decreased by ~30%.

3. JVM GC Optimization

GC overhead was significant on a 64‑core, 256 GB machine (≈10 s per minute of YGC). To alleviate heap pressure, the team moved large caches off‑heap using the Open‑Source Off‑Heap Cache (OHC) library.

Typical OHC usage:

OHCache ohCache = OHCacheBuilder.newBuilder() .keySerializer(yourKeySerializer) .valueSerializer(yourValueSerializer) .build();

Configuration included a 12 GB capacity and 1024 segments with Kryo serialization. After the switch, YGC time fell to ~800 ms per minute and overall throughput increased by ~20%.

End‑to‑End Results

Combining hotspot code refactoring, map key redesign, and off‑heap caching yielded a 31.77% reduction in total latency, a 45.24% increase in throughput, and roughly a 2% CPU share for the split operation. The service achieved a near‑doubling of throughput, meeting the performance goals.

The article concludes that performance tuning is an ongoing process, and the presented practices—identifying bottlenecks, applying targeted code changes, and validating gains with micro‑benchmarks—can serve as a reference for similar backend systems.

JavaJVMPerformance OptimizationgcHotspot CodeMicrobenchmarkOff-Heap Cache
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.