How Switching to StringBuilder Made Template Rendering 10× Faster
An in‑depth case study of Alipay’s card‑coupon template engine reveals that replacing costly String.replace calls with a custom StringBuilder implementation, combined with caching strategies, can boost rendering performance by more than tenfold, while also reducing memory overhead and improving scalability.
1. Background
1.1 Business Background
Alipay’s card pack stores users’ membership cards and coupons. Both the card cell and the card detail are rendered by combining a static template with dynamic data, which is then presented to the end user.
Figure 1 shows how coupon data is displayed on the client side, and Figure 2 illustrates the data assembly process.
Figure 1: Coupon data display on the client side
Figure 2: Client‑side data assembly process
1.2 Problem Discovery
During a recent project we revisited the coupon assembly and rendering logic. The legacy code (Figure 3) has been in use for about ten years and simply replaces variables delimited by $ symbols with dynamic values. Because this logic is core and high‑frequency, we investigated potential performance improvements.
Figure 3: Original template variable replacement code
The initial analysis identified two inefficiencies:
Each loop iteration performs two
indexOfoperations.
Each loop iteration creates a new
substring.
We asked:
Can we reduce the number of
indexOfand
substringcalls?
Is it necessary to search the template for variables on every request?
2. Performance Optimizations
We iteratively refined the implementation across five versions, ultimately achieving more than a ten‑fold speed increase.
2.1 Optimization V1
Removed
indexOfand
substringentirely and introduced a two‑pointer scan that extracts all variables first, then replaces them in a second pass.
Figure 4: V1 implementation
2.2 Optimization V2
Since static templates rarely change, we cached the mapping between a template ID and its variable list, eliminating the need to extract variables on each request. The cache is implemented with Google Guava.
Figure 5: Cache implementation example
Figure 6: V2 implementation
2.3 Performance Comparison (1)
Benchmarking V1 and V2 (Figure 7) shows that both improve over the original, but the gains diminish as traffic grows, indicating other bottlenecks remain.
Figure 7: V1 vs V2 performance
Further analysis revealed that
String.replaceitself is costly because each call recompiles the pattern and creates a new string object.
Figure 8: String.replace implementation
2.4 Optimization V3
Replaced
String.replacewith a manual
StringBuilderapproach, avoiding repeated compilation and object creation.
Figure 9: V3 implementation
Note: The variable extraction in V2 returned a
Set, which lost ordering and duplicated variable handling. V3 switches to an ordered
Listto preserve correctness.
2.5 Optimization V4
Building on V3, we removed the cache entirely and kept only the
StringBuilderreplacement, achieving the most significant speedup (Figure 10).
Figure 10: V4 implementation
2.6 Performance Comparison (2)
Figure 11: V1‑V4 performance comparison
Using StringBuilder yields more than a ten‑fold performance increase.
V4 is slightly faster than V3 with cache, but V3’s code is more readable. Combining the readability of V3 with the speed of V4 suggests a hybrid V5.
2.7 Optimization V5
V5 extracts variables, drops the cache, and uses
StringBuilderfor replacement, achieving the best balance of performance and maintainability.
Figure 12: V5 implementation and 1‑million‑iteration benchmark
3. Summary
Across five optimization iterations, overall performance improved by more than ten times.
The ranking from fastest to slowest is V4 > V3 > V5 > V2 > V1 > original. The critical bottleneck was
String.replace; caching provided modest gains, while
StringBuilderdelivered the major boost.
Key takeaways:
String.replaceincurs template compilation and creates new string objects, consuming CPU and memory.
Replacing it with
StringBuilderreduces intermediate object creation, lowers GC pressure, and cuts CPU load.
Such performance gains translate into substantial resource savings at scale—for example, a 10% speedup in a 2000‑server deployment could free the equivalent of 200 servers, improving stability and reducing scaling costs.
In short, the simple switch from
String.replaceto
StringBuilderis what made a 20‑line piece of code ten times faster.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.