Artificial Intelligence 22 min read

Design and Implementation of a Live Streaming Highlight System with AI Optimization

The paper details a live‑streaming highlight system that integrates heterogeneous data sources, uses a three‑stage pipeline with MySQL/Redis storage, applies sliding‑window interval optimization and AI‑driven title generation, scoring, and segment selection, managed by a shared state‑machine, and outlines future stability and observability improvements.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Implementation of a Live Streaming Highlight System with AI Optimization

The document presents a comprehensive technical overview of a live‑streaming highlight ("high‑light") system, describing its business value, architecture, data generation mechanisms, storage design, interval optimization algorithms, AI‑driven enhancements, state‑machine management, and future roadmap.

Background : Live streaming replay is essential for user engagement and data analysis. Highlight segments capture memorable moments, improve fan interaction, and provide valuable data for content creators.

System Overview : The high‑light system integrates multiple heterogeneous data sources (danmaku, interaction logs, revenue, PK games, voice chat, AI‑generated content) and must support high concurrency for fan‑generated clips.

Architecture : The system follows a three‑stage pipeline – data generation (active and passive triggers), unified data ingestion, and downstream processing. Active triggers are initiated by anchors or fans after a stream, while passive triggers use RPC or MQ for real‑time highlight creation.

Data Generation :

Active trigger creates a task when no existing highlight exists for a session.

Passive trigger processes real‑time requests (e.g., AI‑detected events) via RPC/MQ.

Data Storage : The solution combines MySQL for persistent storage and Redis for caching status flags.

MySQL schema (highlight_get_record):

CREATE TABLE `highlight_get_record` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `uid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
  `roomid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
  `live_key` varchar(100) NOT NULL DEFAULT '' COMMENT '场次id',
  `highlight_type` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光类型',
  `status` tinyint(20) unsigned NOT NULL DEFAULT '0' COMMENT '查询状态 0 未成功 1成功',
  -- ......
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='高光查询记录表';

MySQL schema (highlight_data):

CREATE TABLE `highlight_data` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `uid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
  `roomid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
  `live_key` varchar(100) NOT NULL DEFAULT '' COMMENT '场次id',
  `highlight_type` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光类型',
  `title` varchar(256) NOT NULL DEFAULT '' COMMENT '标题',
  `start_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光片段开始时间',
  `end_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光片段结束时间',
  `score` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光打分',
  `status` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光状态',
  -- ......
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='高光查询记录表';

Redis caches the status of each highlight using a composite key of uid:live_key:highlight_type , reducing DB load.

Highlight Interval Optimization : To locate high‑value segments in long streams (often >6 hours), the system extracts minute‑level metrics (PCU, danmaku, revenue) and applies sliding‑window scoring. The algorithm computes average values over windows of varying sizes (e.g., 3‑point vs. 4‑point averages) and uses weighted coefficients (k₁, k₂, …) to decide whether a longer window should be selected. Formulas for area approximation and coefficient selection are illustrated in the original figures.

AI‑Driven Enhancements :

Title generation: ASR transcribes audio, and a large language model creates engaging titles.

Quality scoring: AI evaluates each clip’s audio‑derived transcript and assigns a score for ranking.

Precise segment selection: Sentence‑level scoring of subtitles determines the most coherent highlight interval.

Challenges include model QPS pressure, caching of generated results, and model tuning via online A/B testing.

State‑Machine Optimization : A shared state‑machine abstracts lifecycle stages across modules, reducing duplicated logic and simplifying maintenance.

Data Presentation : The highlight API returns sorted highlights based on type, AI score, and duration, while filtering overlapping segments using a configurable overlap threshold.

Future Plans : Improve service stability, conduct failure‑drills, build unified observability tools, and continue AI‑based feature enhancements to boost user experience.

backend architectureData ProcessingredisMySQLAI optimizationHighlight SystemLive Streaming
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.