Design and Optimization of a Real-Time Video Recommendation Indexing System
The article describes a real‑time video recommendation indexing system that replaces 30‑minute batch builds with an Elasticsearch‑based service, integrates prior and posterior data pipelines, ensures consistency via locking and version checks, enables zero‑downtime upgrades, smooths write spikes, and boosts recall performance through multi‑level caching and ES tuning, delivering sub‑40 ms latency and significant business growth.
Background : In video recommendation scenarios, especially for news content, newly created videos must reach users quickly while the quality of new items must be evaluated promptly using distribution traffic and posterior data. Both prior data (e.g., tags, author ID) and posterior data (exposure, click, play) have strict latency requirements.
Overall Architecture : The system receives video content from the content center via a message queue, processes it, stores it, builds indexes, and generates forward and inverted index data. Approximately 10 million items are stored at the retrieval layer. A recall layer selects thousands of videos based on user profiles and click history, passes them to a coarse‑ranking layer, which selects hundreds for fine‑ranking, then re‑ranking and final selection of 10+ videos for the user. User behavior (exposure, click, etc.) is reported back, processed in real‑time or offline, and fed back into the storage layer, forming a closed loop.
Scheme Design :
Old scheme built indexes every 30 minutes, which could not meet near‑real‑time needs.
Industry research showed Redis‑based solutions lack flexibility; the team chose an Elasticsearch (ES) based index service due to its mature community and Tencent Cloud PaaS support.
Data pipeline is divided into two major parts: (1) Prior‑data pipeline – full‑load (dump from DB → Kafka → ES) and incremental (binlog → Kafka → ES); (2) Posterior‑data pipeline – user behavior logs (hundreds of billions per day) are aggregated with Flink (1‑minute sliding windows), stored in Redis for 7 days, then merged and written to ES.
Consistency Issues and Solutions :
Redis write module requires read‑modify‑write; solved by Redis locking or Kafka partitioning by rowkey to ensure single‑consumer processing.
Atomicity between Kafka offset commit and Redis write; solved by writing to Redis with a timestamp version, then committing only if the version matches.
Potential inconsistency between full‑load and incremental pipelines when a record is deleted; mitigated by recording a timestamp before a full dump and replaying incremental data only after that point, accepting a ~1‑minute window of possible inconsistency.
Online Index Upgrade (Zero‑Downtime) : A multi‑step process ensures continuous service while upgrading indexes, illustrated in the provided diagram.
Write Smoothing : Flink‑aggregated data cause periodic spikes (peak every minute). An adaptive token‑bucket rate limiter splits the process into a reader thread (collects messages, updates token bucket) and a writer thread (consumes tokens, writes to ES), achieving smooth write rates across all time periods.
Recall Performance Tuning :
High‑concurrency optimization: Multi‑level caching (local BigCache → distributed Redis → ES) reduces ES QPS from 500 k to ~5 % of traffic, achieving >95 % cache hit rate.
Cache design handles cache penetration, avalanche, and provides fallback rate limiting.
Elasticsearch Performance Tuning :
Set appropriate primary_shards (6 shards) based on load testing, yielding average RT ≈ 13 ms and 99th‑percentile RT ≈ 39 ms.
Filter unnecessary source fields and use filter_path to reduce response size from 26 KB to 5 KB, improving throughput by ~30 % and reducing RT by ~3 ms.
Use a routing field (author account ID) to direct queries to specific shards, increasing recall query throughput by 6× and overall ES throughput by ~30 %.
Disable indexing and doc‑values for fields not needed in search or sorting, gaining an additional ~10 % throughput improvement.
Conclusion : The presented indexing solution for the Kankan video recommendation system offers low development cost, flexibility, rich functionality, and high performance, supporting various recall scenarios (focus, cold‑start, tag‑based, account‑based). Since launch, it has driven significant business growth, and future work will continue to refine the architecture.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.