Tag

video analysis

0 views collected around this technical thread.

iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 7, 2024 · Artificial Intelligence

Multimodal Speaker Diarization for Long-Form Video Scripts

iQIYI’s multimodal speaker diarization system splits long‑form video using subtitle timestamps and scene detection, extracts voiceprints with a custom model, hierarchically clusters them, and applies an Activate Speaker Detection algorithm combined with face‑recognition to assign speakers, achieving around 90 % precision and recall and boosting downstream tasks such as summarization, translation, and dubbing.

dialogue recognitioniQIYImultimodal AI
0 likes · 8 min read
Multimodal Speaker Diarization for Long-Form Video Scripts
IT Xianyu
IT Xianyu
Mar 5, 2024 · Artificial Intelligence

Open-Source AI Platform A‑SOiD Enables Video‑Based Behavior Recognition and Prediction

Researchers from Carnegie Mellon University and the University of Bonn have released the open‑source A‑SOiD platform, which learns and predicts user‑defined behaviors solely from video, offering transparent, bias‑aware AI that can be applied to animal studies, human actions, and diverse pattern‑recognition domains.

AIOpen-sourcebehavior recognition
0 likes · 6 min read
Open-Source AI Platform A‑SOiD Enables Video‑Based Behavior Recognition and Prediction
Tencent Cloud Developer
Tencent Cloud Developer
Nov 11, 2022 · Artificial Intelligence

Tencent Advertising Multimedia AI Technology: Research and Application

Liu Wei outlines Tencent’s Advertising Multimedia AI ecosystem on the Taiji platform, describing a five‑platform matrix—Jue for content understanding, Qiankun for automated video creation, Shenzhen for AI‑driven review, Tianyin for hierarchical fingerprinting, and Hunyuan as a multimodal large model—featuring innovations such as massive multimodal pre‑training, logo retrieval, QA‑style attribute extraction, spatiotemporal video analysis, advanced auto‑judgment, and high‑performance hashing that achieve top cross‑modal retrieval results.

Advertising TechnologyLarge Language Modelscomputer vision
0 likes · 18 min read
Tencent Advertising Multimedia AI Technology: Research and Application
IEG Growth Platform Technology Team
IEG Growth Platform Technology Team
Feb 14, 2022 · Artificial Intelligence

Multimodal Evolution and Application in Tencent Game Advertising System

This article describes the end‑to‑end multimodal modeling pipeline—covering text, image, and video understanding, model evolution from shallow to deep networks, key‑frame extraction, fine‑tuning, and multimodal fusion—used in Tencent's game ad exchange platform, along with practical deployment challenges and solutions.

CNNText ClassificationTransformer
0 likes · 22 min read
Multimodal Evolution and Application in Tencent Game Advertising System
DataFunTalk
DataFunTalk
Nov 22, 2020 · Artificial Intelligence

Short Video Analysis in Local Life Scenarios: Techniques and Practices at Meituan

This article presents Meituan's AI-driven short video analysis workflow, covering industry trends, multi‑label video classification, intelligent cover selection, and video generation techniques, while discussing challenges, model building, label expansion, continuous data iteration, and future outlook for video AI in local services.

AIMeituancomputer vision
0 likes · 16 min read
Short Video Analysis in Local Life Scenarios: Techniques and Practices at Meituan
DataFunSummit
DataFunSummit
Nov 5, 2020 · Artificial Intelligence

Short Video Analysis for Local Life Scenarios: Techniques and Practices at Meituan

This article presents Meituan's AI‑driven short‑video analysis pipeline for local‑life scenarios, covering industry trends, multi‑label classification, intelligent cover selection, and video generation, and discusses model construction, label‑system expansion, continuous data iteration, and practical applications in restaurant and hotel domains.

AIMeituancomputer vision
0 likes · 16 min read
Short Video Analysis for Local Life Scenarios: Techniques and Practices at Meituan
DataFunTalk
DataFunTalk
Oct 22, 2020 · Artificial Intelligence

Analyzing Video Excitement: Methods, Frameworks, and Applications

This article presents a comprehensive overview of video excitement analysis, covering quality, aesthetics, and narrative factors, describing a multimodal framework with supervised, weakly supervised, and multi‑task models, and illustrating practical applications such as preview generation, clipping, and automatic cover creation.

Weak Supervisioncontent recommendationexcitement detection
0 likes · 14 min read
Analyzing Video Excitement: Methods, Frameworks, and Applications
DataFunTalk
DataFunTalk
Jul 31, 2020 · Artificial Intelligence

WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation

This article details the technical architecture behind WeChat's 'Kan Kan' content understanding platform, covering text and multimedia analysis, tag extraction, entity recognition, knowledge graph construction, and how these components enhance recommendation recall, ranking, and user engagement across the ecosystem.

Recommendation systemscontent understandingknowledge graph
0 likes · 46 min read
WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation
Youku Technology
Youku Technology
Jul 29, 2020 · Artificial Intelligence

Core Technology of Video Content Understanding: Technical Practice of Partial Re-ID in Video Inspection

The talk explains how Alibaba’s Entertainment Content Operation Platform applies a Partial‑ReID algorithm to overcome the challenges of person re‑identification in heavily edited video content, enabling accurate cross‑shot character matching, richer appearance data, and metrics such as presence, interaction, and storyline for improved video quality assessment.

AIPartial Re-IDPerson Re-identification
0 likes · 2 min read
Core Technology of Video Content Understanding: Technical Practice of Partial Re-ID in Video Inspection
Amap Tech
Amap Tech
Jul 9, 2020 · Artificial Intelligence

AMAP-TECH Algorithm Competition: Dynamic Road‑Condition Analysis from In‑Vehicle Video Images

Alibaba Amap’s AMAP‑TECH competition invites participants to develop AI computer‑vision models that classify real‑time road conditions—smooth, slow, or congested—from short sequences of dash‑cam images, using a labeled dataset of 1,500 training sequences and a weighted F1‑score evaluation, with cash prizes up to ¥60,000.

AIcompetitioncomputer vision
0 likes · 8 min read
AMAP-TECH Algorithm Competition: Dynamic Road‑Condition Analysis from In‑Vehicle Video Images
Youku Technology
Youku Technology
Jun 19, 2020 · Artificial Intelligence

Video-based Temporal Event Detection Methods

In the fourth Alibaba Digital Media Technology Night Talk, algorithm engineer Liu Xiaolong presents an overview of video‑based temporal event detection, covering its problem background, representative prior works, and the latest research advances within the MEDIA AI Algorithm Challenge series.

AlibabaArtificial IntelligenceTemporal Event Detection
0 likes · 1 min read
Video-based Temporal Event Detection Methods
DataFunTalk
DataFunTalk
Apr 1, 2020 · Artificial Intelligence

Knowledge Graph‑Based Multimodal Semantic Understanding at Baidu

This article outlines Baidu's large‑scale knowledge graph applications in AI, detailing the need for multimodal semantic understanding, challenges in text and video comprehension, and the technical solutions including entity annotation, conceptization, knowledge networks, and multimodal fusion for enhanced search, recommendation, and visual question answering.

conceptualizationentity annotationknowledge graph
0 likes · 15 min read
Knowledge Graph‑Based Multimodal Semantic Understanding at Baidu
转转QA
转转QA
Nov 13, 2019 · Frontend Development

Performance Optimization of M Page: Achieving Sub‑Second Load and Zero White Screen via Video Frame Analysis

This article describes how the M page’s user‑perceived performance was dramatically improved by applying techniques such as SSR, skeleton screens, image compression, and a video‑frame analysis testing method that delivers millisecond‑level response‑time measurements, enabling sub‑second load times and eliminating white‑screen delays.

OptimizationSSRSkeleton Screen
0 likes · 5 min read
Performance Optimization of M Page: Achieving Sub‑Second Load and Zero White Screen via Video Frame Analysis
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

deep learningfeature fusioniQIYI challenge
0 likes · 7 min read
Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble
DataFunTalk
DataFunTalk
May 21, 2019 · Artificial Intelligence

Multimodal Video Analysis and Its Applications: Intelligent Asset Management, Automatic Cover Generation, Knowledge Graph, and Search

This article presents a comprehensive overview of Alibaba's large entertainment division research on multimodal video analysis, covering intelligent video asset management, automated cover creation with personalized distribution, video knowledge graph construction, multimodal search techniques, and future directions in AI-driven media processing.

AIcover generationknowledge graph
0 likes · 17 min read
Multimodal Video Analysis and Its Applications: Intelligent Asset Management, Automatic Cover Generation, Knowledge Graph, and Search
Youku Technology
Youku Technology
May 6, 2019 · Artificial Intelligence

Exploring Intelligent Production at Youku: AI‑Driven Video Analysis and Automation

The talk describes Youku’s intelligent production platform, which uses AI and cloud computing to automatically analyze video frames, extract fine‑grained metadata such as scenes, persons, actions and scores, and then generate highlights, vertical clips, annotations and feedback for editors and upstream producers, while addressing challenges like pose‑tracking, graph‑based action classification and future plans for deeper video understanding and open competitions.

AIImage SearchPose Estimation
0 likes · 14 min read
Exploring Intelligent Production at Youku: AI‑Driven Video Analysis and Automation
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 12, 2019 · Artificial Intelligence

iQIYI Multimodal Technology: Datasets, Applications, and Future Directions

iQIYI leverages multimodal AI—combining audio, visual, and textual cues—to advance video understanding, releasing the world’s largest celebrity dataset (iQIYI‑VID), powering applications such as actor‑focused playback, AI Radar, emoji generation, and rapid automated editing, while pursuing future research in emoji captioning, cross‑modal retrieval, visual question answering, and broader health‑care and education uses.

computer visiondatasetsiQIYI
0 likes · 13 min read
iQIYI Multimodal Technology: Datasets, Applications, and Future Directions
DataFunTalk
DataFunTalk
Dec 16, 2018 · Artificial Intelligence

Practical Applications of Video Content Understanding at Hulu

This article details Hulu's AI-driven techniques for fine-grained video segmentation, end‑cap detection, subtitle detection and language recognition, background‑music classification, automated processing pipelines, tag generation, and cover‑image regeneration, illustrating how these methods improve user experience and operational efficiency.

AI pipelinesCNNcontent understanding
0 likes · 14 min read
Practical Applications of Video Content Understanding at Hulu
JD Tech
JD Tech
May 4, 2018 · Artificial Intelligence

Optical Flow: Principles, Methods, and Applications in Computer Vision

This article introduces the fundamentals and evolution of optical flow, covering classic algorithms such as Horn‑Schunck and Lucas‑Kanade, modern deep‑learning approaches like FlowNet, and their practical applications in video detection, semantic segmentation, and novel view synthesis.

CNNcomputer visiondeep learning
0 likes · 15 min read
Optical Flow: Principles, Methods, and Applications in Computer Vision