AI-Powered Sports Video Applications: Figure Skating Action Recognition, Multimodal Classification, and Football Highlight Clipping
The article showcases three AI‑driven sports video solutions—real‑time figure‑skating action recognition with ST‑GCN, multimodal video classification merging text, image and audio via ERNIE and TextCNN, and automated football highlight clipping using TSN‑BMN‑LSTM—each achieving over 85% accuracy, fully open‑source on PaddlePaddle with one‑click notebooks and a live developer session.
Recent Winter Olympic highlights featuring athletes such as Gu Ailing, Wu Dajing, and Su Yiming have drawn attention not only to the sporting achievements but also to the AI technologies that support them. AI is being used for motion‑recognition‑assisted training and scoring, as well as for intelligent classification and automated video editing, dramatically reducing manual effort and time costs.
To help developers explore these AI applications, Baidu PaddlePaddle, Baidu Cloud, and Associate Professor Liu Shenglan from Dalian University of Technology have released three industry‑practice examples covering the entire workflow from data preparation to model optimization and deployment.
1. Figure Skating Action Recognition
This scenario tackles the high complexity of figure‑skating movements by applying the ST‑GCN (Spatial‑Temporal Graph Convolutional Network) model on human‑pose keypoints. The system can recognize and label technical moves in real time, providing assistance for scoring and motion‑quality assessment. Challenges include the difficulty of distinguishing actions from a few frames and the subtle differences between sub‑categories. By modifying the network architecture and tuning parameters such as batch_size and num_classes , the solution achieves 91% accuracy.
2. Multimodal Sports Video Classification
Sports videos contain rich textual, audio, and visual information. This example fuses features from three modalities—text, image, and audio—using a pre‑trained ERNIE model (frozen) combined with a TextCNN, and applies cross‑attention to enhance inter‑modal interaction. The pipeline improves high‑level semantic tagging, reaching 85.59% accuracy. The approach addresses challenges such as limited high‑quality labeled data, modality semantic gaps, and noise in long videos.
3. Football Video Highlight Clipping
Automatic highlight extraction requires precise detection of action boundaries in videos that often contain redundant background. The solution adopts a TSN+BMN+LSTM backbone, leverages PaddlePaddle’s PP‑TSM/TSN models for frame‑level feature extraction, and employs data augmentation and temporal proposal techniques. The final system attains 91% accuracy and an F1‑score of 76.2%.
All three tutorials are fully open‑source on GitHub (https://github.com/PaddlePaddle/awesome-DeepLearning) and can be run with one‑click notebooks on AI Studio. Detailed documentation covers data preprocessing, model selection, optimization tricks, and deployment steps, enabling developers to quickly prototype AI solutions for sports video analysis.
A live session will be held on February 17 (20:00‑21:30) where Professor Liu and Baidu engineers will walk through the end‑to‑end workflow and answer questions. Participants can join the WeChat group for free access to the replay and a chance to receive a handbook of industry practice examples across smart city, manufacturing, finance, and internet domains.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.