Game Recommendation System: Architecture, Models, Scaling, and Operational Practices
The article details the design, evolution, and operational practices of Vivo’s large‑scale game recommendation platform, covering its initial rule‑based model, layered strategy framework, multi‑level caching, GC tuning, rate‑limiting, fine‑grained A/B testing, multi‑path recall, dynamic exposure control, and future intelligent extensions.
This article, authored by the Vivo Internet Server Team (Ke Jiachen and Wei Ling), presents a comprehensive overview of the development, architecture, and operational strategies of a large‑scale game recommendation platform.
Background and Significance Recommendation systems complement search by passively delivering content that matches vague user needs, thereby efficiently connecting users with games and services. The system distributes games across major traffic entrances (Game Center, App Store, Browser, Jovi, etc.) and aims to recommend games with high download intent and commercial value.
Initial Recommendation Model The early model consists of four components: (1) Operational recommendation rule configuration, (2) Algorithm model training, (3) Recommendation strategy activation, and (4) Data collection via event reporting. Operational rules are stored in a cache and invoked by the recommendation service based on scene information, after which the algorithm service ranks resources and returns results to the client.
Business Growth and Architecture Evolution As more business lines adopted the platform, the system expanded to support categories, topics, leaderboards, homepages, and search, with strategies such as intervention, shuffling, resource allocation, and guaranteed volume. This increased complexity drove architectural changes to improve performance, scalability, and availability.
3.1 General Combination Strategy in an Entropic Environment A layered strategy framework was introduced, featuring two acceptors, an executor, and a context object. Matchers, listeners, and processes implement various logic branches, allowing developers to add new matchers/processes via configuration without modifying core code.
3.2 Multi‑Level Cache and Near‑Real‑Time Strategy To handle peak traffic of ~30k TPS, a Redis + local cache design was adopted. Configuration updates write to MySQL, then Redis; local caches lazily load from Redis and expire periodically, ensuring eventual consistency. Later, a message‑queue‑driven version‑control mechanism was added to synchronize cache updates in real time.
3.3 High‑Throughput Service Garbage Collection The JVM experienced frequent Full GC pauses (350‑400 ms) due to large heap usage from local caches and rapid object allocation. Solutions included moving infrequently changed caches off‑heap, deduplicating minute‑level cache updates, and switching to the G1 collector with tuned parameters (e.g., -XX:+UseG1GC , -XX:MaxGCPauseMillis=200 , -XX:InitiatingHeapOccupancyPercent=25 , etc.).
3.4 Rate Limiting, Degradation, and Fallback Strategies Components such as Hystrix, Sentinel, and Resilience4j are used for circuit breaking and throttling. A hierarchical rate‑limiting scheme prioritizes core business flows. For personalized fallback, historical user data is stored and served when generic fallback would cause recommendation homogeneity.
4.1 Fine‑Grained Operational Model – Multi‑Layer Hash Orthogonal Experiment Platform To achieve reliable A/B testing without physical traffic isolation, a multi‑layer hash mechanism assigns traffic to experiment buckets using a function Hash(A) . This ensures random, independent traffic distribution across layers and supports configuration‑driven experiment management.
4.2 Recall Optimization – Multi‑Path Recall Single‑path recall suffers from limited candidate coverage. Multi‑path recall aggregates results from personalized, algorithmic, game‑pool, and tag‑based recall pipelines, then merges, filters, and truncates candidates before scoring.
4.3 Dynamic Parameter Tuning for Exposure Control Exposure, download, and CTR metrics drive revenue. A dynamic tuning system adjusts game exposure in real time based on collected metrics, applying tiered exposure weights and feedback loops to balance positive and negative adjustments.
5. Outlook – Intelligent Construction Future work includes building an end‑to‑end intelligent operation stack (smart coupons, push, feedback processing) to further enhance platform value beyond pure distribution.
Code Example (Experiment Data Formats) { "code": 0, "data": [ { "score": 0.016114970572330977, "data": { "gameId": 53154, "appId": 1364982, "recommendReason": null }, "gameps": "埋点信息" } ], "reqId": "20200810174423TBSIowaU52fjwjjz" } 游戏维度: { "reqId": "20200810142134No5UkCibMdAvopoh", "scene": "appstore.idx", "imei": "869868031396914", "experimentInfo": [ { "experimentId": "RECOMMENDATION_SCENE", "salt": "RECOMMENDATION_SCENE", "imei": "3995823625", "sinfo": "策略信息" }, { "experimentId": "AUTO_RECOMMENDATION_REASON", "salt": "RECOMMENDATION_SCENE", "imei": "1140225751", "sid": "3,4,5" } ] }
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.