Tag

multi-modal learning

0 views collected around this technical thread.

HomeTech
HomeTech
Jul 7, 2023 · Artificial Intelligence

Multi-Modal Video Understanding and AIGC Video Generation at Autohome

This article presents a comprehensive multi-modal video understanding system for AIGC video generation, detailing technical architecture, GCN-based semi-supervised learning, and practical applications across automotive content scenarios.

AIGCBERTNeXtVLAD
0 likes · 8 min read
Multi-Modal Video Understanding and AIGC Video Generation at Autohome
Youku Technology
Youku Technology
Mar 23, 2021 · Artificial Intelligence

Text-Video Alignment Algorithm for Automated Short Video Production at Youku

Youku’s new text‑video alignment system automatically generates short video summaries by extracting multimodal video and linguistic features, matching sentences to clips through embedding and tag‑level models, and enabling AI‑driven auto‑editing that cuts production time from days to minutes.

BERTNLPcross-modal matching
0 likes · 10 min read
Text-Video Alignment Algorithm for Automated Short Video Production at Youku
Youku Technology
Youku Technology
Jun 8, 2020 · Artificial Intelligence

Video Search Technology and Multi-modal Applications at Alibaba Youku

Alibaba’s Youku video search platform combines six-layer architecture—data extraction, technology integration, recall, relevance, ranking, and intent understanding—leveraging CV, NLP, knowledge graphs, and multi‑modal cues such as face, OCR, and audio recognition to overcome title‑mismatch, entity, and semantic challenges and deliver precise, diverse video retrieval.

Knowledge GraphNatural Language Processinginformation retrieval
0 likes · 15 min read
Video Search Technology and Multi-modal Applications at Alibaba Youku
NetEase Media Technology Team
NetEase Media Technology Team
Apr 4, 2019 · Artificial Intelligence

Video Recommendation System: Framework, Topic Clustering, and Related Video Retrieval

The paper proposes a video recommendation framework that combines recall and ranking modules, using a multi‑modal topic clustering approach—integrating audio, visual, and textual features via NeXtVLAD, PCA, and K‑Means—to generate unified video representations, improve candidate selection, and boost click‑through and viewing time, while addressing cold‑start and semantic relevance challenges.

A/B TestingNeXtVLADcold-start problem
0 likes · 7 min read
Video Recommendation System: Framework, Topic Clustering, and Related Video Retrieval