How Knowledge Graphs Are Transforming Multi‑Modal AI: A Deep Survey

This comprehensive survey examines over 300 recent papers on knowledge‑graph‑driven multi‑modal learning and multi‑modal knowledge graphs, outlining key tasks, datasets, benchmarks, challenges, and future directions, while highlighting the impact of large language models and multimodal pre‑training techniques.

KG4MMKnowledge GraphsMMKG

0 likes · 10 min read

How Knowledge Graphs Are Transforming Multi‑Modal AI: A Deep Survey

HomeTech

Jul 7, 2023 · Artificial Intelligence

Multi-Modal Video Understanding and AIGC Video Generation at Autohome

This article presents a comprehensive multi-modal video understanding system for AIGC video generation, detailing technical architecture, GCN-based semi-supervised learning, and practical applications across automotive content scenarios.

AIGCBERTNeXtVLAD

0 likes · 8 min read

Multi-Modal Video Understanding and AIGC Video Generation at Autohome

Youku Technology

Mar 23, 2021 · Artificial Intelligence

Text-Video Alignment Algorithm for Automated Short Video Production at Youku

Youku’s new text‑video alignment system automatically generates short video summaries by extracting multimodal video and linguistic features, matching sentences to clips through embedding and tag‑level models, and enabling AI‑driven auto‑editing that cuts production time from days to minutes.

BERTNLPcross-modal matching

0 likes · 10 min read

Text-Video Alignment Algorithm for Automated Short Video Production at Youku

Youku Technology

Jun 8, 2020 · Artificial Intelligence

Video Search Technology and Multi-modal Applications at Alibaba Youku

Alibaba’s Youku video search platform combines six-layer architecture—data extraction, technology integration, recall, relevance, ranking, and intent understanding—leveraging CV, NLP, knowledge graphs, and multi‑modal cues such as face, OCR, and audio recognition to overcome title‑mismatch, entity, and semantic challenges and deliver precise, diverse video retrieval.

Machine Learninginformation retrievalmulti-modal learning

0 likes · 15 min read

Video Search Technology and Multi-modal Applications at Alibaba Youku

NetEase Media Technology Team

Apr 4, 2019 · Artificial Intelligence

Video Recommendation System: Framework, Topic Clustering, and Related Video Retrieval

The paper proposes a video recommendation framework that combines recall and ranking modules, using a multi‑modal topic clustering approach—integrating audio, visual, and textual features via NeXtVLAD, PCA, and K‑Means—to generate unified video representations, improve candidate selection, and boost click‑through and viewing time, while addressing cold‑start and semantic relevance challenges.

A/B testingHierarchical ClusteringNeXtVLAD

0 likes · 7 min read

Video Recommendation System: Framework, Topic Clustering, and Related Video Retrieval