Backend Development 16 min read

Performance Optimization: Concepts, Metrics, and a Real‑World Case Study from Youzan Live Streaming

Performance optimization is a continuous, data‑driven practice that monitors response time and concurrency, applies techniques such as indexing, caching, parallelism, and asynchronous processing, and in Youzan’s live‑streaming product‑detail case reduced bottlenecks by adding multi‑level caches, circuit‑breaker fallbacks, and parallel sub‑task aggregation.

Youzan Coder
Youzan Coder
Youzan Coder
Performance Optimization: Concepts, Metrics, and a Real‑World Case Study from Youzan Live Streaming

During a holiday night, a live‑streaming merchant experienced a sudden screen freeze and a large “404” error, triggering many customer complaints and alerts. The Youzan Education technical team responded quickly, analyzed the issue, and restored the system. The root cause was insufficient performance and availability under traffic spikes.

1. What is Performance Optimization? Similar to the entropy law, a software system becomes slower as usage grows and features evolve. Performance optimization is a continuous activity throughout the software lifecycle, aiming to keep response time low and concurrency high.

1.1 Performance Metrics Two main dimensions are considered:

Response Time (RT): measured by average response time (AVG) and percentile metrics such as TP99, TP95, etc.

Concurrency Capability: usually expressed as QPS (queries per second) or TPS (transactions per second), with TPS being more common in performance assessments.

Percentile metrics better reflect overall latency because they capture long‑tail requests that average values may hide.

1.2 Essence of Performance Optimization It is analogous to algorithm analysis: time complexity ↔ response time, space complexity ↔ concurrency. Optimization involves improving time, improving space, or trading one for the other.

An everyday analogy is a single‑lane road limited to 50 km/h. Increasing lanes (space) or raising speed limits (time) both raise throughput.

2. How to Perform Performance Optimization

2.1 Systematic Thinking Optimization is a collaborative effort among developers, testers, and operations. Key steps include defining business scenarios, collecting monitoring and load‑test data, locating bottlenecks, applying targeted improvements, and iterating until goals are met.

2.2 Common Optimization Techniques

2.2.1 Improve Single‑Request Efficiency

Accelerate each node in the call chain (e.g., add DB indexes, read/write splitting, caching, using ES for complex queries, choosing efficient algorithms and data structures).

Reduce redundant or unnecessary queries; batch requests; select the most appropriate downstream API.

2.2.2 Parallelize Internal Processing Split a request into sub‑tasks and process them concurrently (e.g., using CompletableFuture ).

2.2.3 Asynchronous Processing Offload non‑critical work to message queues, background threads, or scheduled jobs.

2.2.4 Parallelize Multiple Requests Deploy services in clusters with load balancing and use thread pools for concurrent handling.

3. Real‑World Case: Youzan Live‑Streaming Product Detail Page

The flow: user visits product detail → places an order → accesses live‑stream entry after permission check. Performance monitoring showed the bottleneck at the “live product detail” request.

Identified problems:

Weak‑dependency interfaces lacked caching and fallback, causing failures.

Strong‑dependency interfaces had poor performance, leading to timeouts.

Downstream queries returned more fields than needed, inflating RT.

Incorrect downstream API usage caused unnecessary extra calls.

Stateless query interfaces were not cached, resulting in frequent RPC calls.

Optimization actions:

Weak‑dependency interfaces : set RPC timeout to 1.5× TP99/TP95, add two‑level cache (zanKV) with asynchronous refresh, and enable circuit‑breaker fallback.

Strong‑dependency interfaces : set timeout to 1.5× TP99, configure Dubbo retry (2‑3 times), and apply caching (transparent multi‑level cache TMC for hot data, Guava local cache for others).

Parallelized product‑detail aggregation by splitting into four independent sub‑tasks and using an internal parallel‑processing framework.

Standardized query APIs into three granularity levels (coarse, medium, fine) to ensure upstream services request only needed fields.

4. Summary

Performance optimization is an ongoing process that must be tailored to specific cases. Early‑stage issues may be solved by indexing, while later stages may require batching, caching, or architectural changes. Continuous monitoring, data‑driven analysis, and a systematic approach are essential.

For collaboration or recruitment, contact: [email protected].

monitoringPerformance Optimizationbackend developmentcachingload testingsystem scalability
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.