Artificial Intelligence 14 min read

Huya Live Streaming Recommendation Architecture: Business Background, System Design, Vector Retrieval, and Ranking

This article presents a comprehensive overview of Huya Live's recommendation system, covering business background, system architecture, vector retrieval techniques, ranking pipeline, technical challenges, implementation details, and future outlook, highlighting scalability and performance optimizations.

DataFunTalk
DataFunTalk
DataFunTalk
Huya Live Streaming Recommendation Architecture: Business Background, System Design, Vector Retrieval, and Ranking

Hi, I am Li Cha from Huya Live's recommendation engineering team, responsible for the recommendation architecture of the platform. Live streaming recommendation focuses on top streamers, emphasizing relationship graphs, textual cues, and long‑term value, which leads to distinct engineering requirements.

Business Background

Huya Live has three main recommendation scenarios: homepage live streams, square video recommendations, and live‑room ad recommendations, along with many smaller use cases.

Live streaming is a domain where top streamers dominate, requiring high relevance to relationship chains and long‑term value, which differentiates it from typical image or video recommendation pipelines.

Business Architecture

The architecture follows the classic recommendation pipeline with some customizations. The ingestion layer handles pass‑through, fusion, degradation, and deduplication. The profiling layer provides long‑term, short‑term, and real‑time user and streamer features. Downstream modules include recall, ranking, re‑ranking, and supporting platform services.

Unlike typical image/video recommendation where strict deduplication is needed (e.g., Bloom filters), Huya’s live scenario requires fast, high‑frequency deduplication because streamer tags may change at broadcast start, demanding low‑latency updates.

Vector Retrieval

1. Background

In 2016 Google disclosed YouTube’s vector‑based retrieval architecture, inspiring many systems to improve embeddings for better business metrics.

Huya initially used brute‑force search, but as streamer count grew, it became infeasible, prompting a shift to vector retrieval early last year.

We evaluated Facebook’s Faiss and Google’s ScaNN; ScaNN showed superior performance and recall on our benchmarks.

2. Technical Challenges

Production requires high‑throughput, low‑latency, highly available service.

Data must be refreshed quickly to meet business needs while providing fault tolerance.

Efficient index building and online quality guarantees are essential.

3. Architecture Implementation

We designed a read‑write‑separated, file‑based architecture. The index builder produces vectors and writes them as .npy files, reducing size and easing debugging. Distribution uses Alibaba’s open‑source Dragonfly for P2P file delivery.

The online server consists of a retrieval engine and an operator module, both accessed via SDK.

Retrieval Engine : supports ANN and brute‑force search, with load/unload and double‑buffer switching for zero‑downtime updates.

Operator Module : follows a generic operator interface, making it easy to extend and reuse.

Deployment is managed through a control platform, improving iteration speed.

Online queries flow as follows: the SDK obtains a user profile, generates an embedding, and performs a top‑k ANN search. The SDK can skip any step, allowing direct vector queries for debugging.

Our system achieves high throughput via lock‑free double‑buffer loading, batch queries, in‑memory computation, LRU caching, and instruction‑set optimizations, while maintaining low latency and high availability.

Data updates are fast: a 2‑million‑record dataset loads into memory within 5 seconds and finishes distribution in 10 seconds. Versioned index files enable multi‑version online serving with a one‑minute rollout.

Offline builder optimizations include a semi‑automatic hyper‑parameter search tool, horizontal scaling with distributed locks, multi‑process parallel building, and extensive metric validation (latency, recall, coverage). Our top‑20 recall reaches 0.99 coverage, with over 90 % success rate.

Ranking

1. Data Flow

Data flow includes offline training, online scoring, and feature processing. Features are derived from long‑term, short‑term, and real‑time user/streamer signals. User profile service uses LRU caching and graceful degradation; streamer profiles employ localized double‑buffer caches to handle high read amplification.

2. Features

We convert plaintext features to TFRecord format using Protocol Buffers for schema validation. Offline feature extraction leverages JNI to keep consistency with online processing.

3. Inference Optimizations

Integrated the gRPC‑based inference service as a dynamic library to fit the company’s ecosystem.

Applied common community optimizations: model warm‑up and dedicated thread pools.

Bandwidth throttling during peak hours to limit model download traffic.

Moved user‑side feature copying into the inference service, reducing bandwidth by over 50 %.

After these optimizations, the ranking service achieves four‑nine availability and reduces data transmission bandwidth by more than 50 %.

Summary and Outlook

We will continue to refine the architecture, follow emerging business needs, and improve platform iteration efficiency.

Thank you for listening.

live streamingAIrecommendation systemrankingvector searchHuya
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.