Artificial Intelligence 20 min read

Evolution and Architecture of Vivo's Game Recommendation System

This article chronicles the development, architectural challenges, and engineering solutions of a large‑scale game recommendation platform, covering background, initial models, business growth, caching strategies, GC optimization, rate‑limiting, fine‑grained operations, multi‑path recall, A/B testing, and future intelligent enhancements.

Architecture Digest
Architecture Digest
Architecture Digest
Evolution and Architecture of Vivo's Game Recommendation System

Authors: Vivo Internet Server Team – Ke Jiachen, Wei Ling

The article introduces the development history of a game recommendation project, discusses business and architectural challenges encountered in large‑scale systems, and presents solutions implemented by engineers, offering valuable references for similar projects.

1. Background and Significance of Game Recommendation

Search and recommendation are the two main ways users acquire information; recommendation systems passively connect users with content, saving time and cost, which led to the creation of game recommendation systems.

The system distributes games across major traffic entrances (Game Center, App Store, Browser, Jovi, etc.) using various recommendation algorithms and strategies to recommend high‑intent, commercially valuable games, later extending to content and material recommendations.

2. Initial Model of Game Recommendation

The goal is to launch games that users want while ensuring commercial value, with commercial value controlled by operational rules and user intent derived from algorithmic ranking based on feature data and feedback.

The model consists of four parts:

Operational recommendation rule configuration

Algorithm model training

Recommendation strategy activation

Data point reporting

Before a strategy takes effect, operations generate configuration rules stored in cache for the recommendation high‑concurrency interface. When a user accesses a specific page, the backend requests the recommendation service with scene information, maps to relevant configurations (recall, tags, expiration, algorithms, etc.), calls the algorithm service for sorting, and returns results to the app, while reporting user behavior and recommendation data.

3. Business Growth and Architecture Evolution

As more business lines adopt the recommendation system, functionality expands, covering scenarios such as categories, topics, rankings, homepages, and search, with strategies like intervention, dispersion, resource allocation, and guaranteed volume, supporting various recommendation types (co‑operated games, mini‑games, content, reasons).

These diverse scenarios increase complexity, posing new challenges in performance, scalability, and availability, driving architectural changes.

3.1 General Composite Strategy in an Entropic Environment

During the 0‑to‑1 phase, the focus was on increasing distribution volume using a layered architecture. In the 1‑to‑2 phase, the system also recommends content and materials, requiring flexible strategy invocation, dynamic rules, and user‑personalized logic, leading to the need for a highly reusable, extensible, low‑code strategy framework.

The composite strategy involves two roles—acceptor and executor—communicating via a recommendation context. Matchers, listeners, and processes implement various logic. The acceptor selects a strategy template, the listener performs preprocessing, the executor runs processes based on preconditions, and results are logged.

3.2 Multi‑Level Cache and Near‑Real‑Time Strategy

The system handles peak traffic of ~30k TPS, with read‑heavy workloads. To ensure read performance, a Redis + local cache design is used: configuration updates write to MySQL, then Redis; local cache expires lazily, loading from Redis as needed, guaranteeing eventual consistency.

As node count grew, consistency delays increased. To address stale caches during configuration changes, a message queue and version comparison were added for real‑time synchronization.

3.3 High‑Concurrency Service Garbage Collection Handling

Frequent Full GC (FGC) in Java services caused latency spikes. The team applied several measures:

Move rarely changed caches (hour‑level) off‑heap.

Skip updates for infrequently changing caches (minute‑level) when values are unchanged.

Switch to G1 GC with tuned parameters:

-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=25
-XX:MaxNewSize=3072M -Xms4608M -Xmx4608M -XX:MetaspaceSize=512M
-XX:MaxMetaspaceSize=512M

3.4 Rate Limiting, Degradation, and Fallback Strategies

Components like Hystrix, Sentinel, and Resilience4j are used for circuit breaking and rate limiting, but the team also implements layered rate limiting to prioritize critical services. For personalized fallback, historical user data is stored and used to generate tailored fallback recommendation lists.

4. Exploration of Fine‑Grained Operation Modes

After the 0‑to‑1 expansion and 1‑to‑2 rapid growth phases, the architecture stabilizes, prompting focus on efficiency and cost reduction through fine‑grained operational design, including a layered orthogonal experiment platform.

4.1 Multi‑Layer Hash Orthogonal Experiment Platform

Accurate recommendation requires iterative strategy refinement based on data feedback. Traditional A/B testing via traffic isolation is unsuitable for complex scenarios, so a multi‑layer hash approach assigns experiments using a hash function on a layer identifier, ensuring random, independent traffic distribution across layers.

The experiment lifecycle includes preparation (hypothesis, baseline/experiment traffic split, configuration), execution (traffic routing, strategy execution, data reporting), and analysis (big‑data aggregation, metric evaluation, strategy iteration).

Experiment modules consist of configuration (mapping traffic to scenarios), data reporting (SDK‑based logging of game and request dimensions), and result analysis (big‑data processing to guide strategy adjustments).

{"code":0,"data":[{"score":0.016114970572330977,"data":{"gameId":53154,"appId":1364982,"recommendReason":null},"gameps":"埋点信息"}],"reqId":"20200810174423TBSIowaU52fjwjjz"}
{"reqId":"20200810142134No5UkCibMdAvopoh","scene":"appstore.idx","imei":"869868031396914","experimentInfo":[{"experimentId":"RECOMMENDATION_SCENE","salt":"RECOMMENDATION_SCENE","imei":"3995823625","sinfo":"策略信息"},{"experimentId":"AUTO_RECOMMENDATION_REASON","salt":"RECOMMENDATION_SCENE","imei":"1140225751","sid":"3,4,5"}]}

4.2 Multi‑Path Recall Optimization

Recall narrows a massive candidate pool to a manageable size for ranking. Single‑path recall suffers from limited coverage; multi‑path recall combines personalized, algorithmic, pool‑based, and tag‑based recalls, merging, filtering, and truncating results before scoring.

4.3 Dynamic Parameter Adjustment for Exposure

Algorithm performance is evaluated by exposure, download, CTR, etc. To react quickly to business needs, a dynamic tuning mechanism adjusts exposure in real time based on collected metrics, categorizing games into tiers and time windows, computing weight factors, and modifying exposure accordingly.

5. Outlook: Intelligent Construction

Future work aims to build a full‑stack support system covering search, intelligent operations, smart coupons, push, and user feedback processing, further enhancing platform value beyond simple distribution.

System ArchitectureArtificial IntelligenceGarbage CollectioncachingA/B testingGame Recommendation
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.