Tagged articles

26 articles

Page 1 of 1

Apr 28, 2026 · Artificial Intelligence

Zero‑Learning Video to Semantic Vector Pipeline with MaxFrame’s Distributed AI Engine

Faced with exploding video volumes and bottlenecks in frame extraction, labeling, and vector storage, MaxFrame offers a three‑step, end‑to‑end distributed pipeline that turns raw videos into searchable semantic vectors while providing zero‑threshold scaling, transparent OSS mounting, row‑level fault tolerance, and elastic concurrency control.

MaxComputeMaxFrameOSS

0 likes · 6 min read

Zero‑Learning Video to Semantic Vector Pipeline with MaxFrame’s Distributed AI Engine

AsiaInfo Technology: New Tech Exploration

Nov 4, 2025 · Artificial Intelligence

How Multimodal Large Models Are Revolutionizing Video Analysis

This article examines the evolution from single‑frame video analysis to multimodal large models, detailing their architecture, optimization techniques, experimental validation on edge devices, and practical scenarios, while highlighting current limitations and future directions for AI‑driven video understanding.

AILarge ModelsMultimodal

0 likes · 20 min read

How Multimodal Large Models Are Revolutionizing Video Analysis

iQIYI Technical Product Team

Nov 7, 2024 · Artificial Intelligence

Multimodal Speaker Diarization for Long-Form Video Scripts

iQIYI’s multimodal speaker diarization system splits long‑form video using subtitle timestamps and scene detection, extracts voiceprints with a custom model, hierarchically clusters them, and applies an Activate Speaker Detection algorithm combined with face‑recognition to assign speakers, achieving around 90 % precision and recall and boosting downstream tasks such as summarization, translation, and dubbing.

dialogue recognitioniQIYImultimodal AI

0 likes · 8 min read

Multimodal Speaker Diarization for Long-Form Video Scripts

NetEase Smart Enterprise Tech+

Mar 12, 2024 · Artificial Intelligence

How Advanced Video AI Transforms Content Moderation and Retrieval

This article explores how modern video AI techniques—ranging from transformer‑based classification to semi‑supervised retrieval and token‑halting acceleration—enable efficient, accurate detection of prohibited content and fast, scalable video search in the era of short‑form media.

AI moderationSemi-supervised LearningTransformer

0 likes · 18 min read

How Advanced Video AI Transforms Content Moderation and Retrieval

IT Xianyu

Mar 5, 2024 · Artificial Intelligence

Open-Source AI Platform A‑SOiD Enables Video‑Based Behavior Recognition and Prediction

Researchers from Carnegie Mellon University and the University of Bonn have released the open‑source A‑SOiD platform, which learns and predicts user‑defined behaviors solely from video, offering transparent, bias‑aware AI that can be applied to animal studies, human actions, and diverse pattern‑recognition domains.

AIMachine Learningbehavior recognition

0 likes · 6 min read

Open-Source AI Platform A‑SOiD Enables Video‑Based Behavior Recognition and Prediction

Alibaba Cloud Developer

Dec 19, 2022 · Artificial Intelligence

How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition

This article explores how artificial intelligence techniques such as deep learning, object detection, multi‑object tracking, and coordinate projection are applied to football video analysis to automatically detect the ball and players, map their positions onto the field, and recognize key events like shots and goals.

AIcomputer visionobject detection

0 likes · 16 min read

How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition

Tencent Cloud Developer

Nov 11, 2022 · Artificial Intelligence

Tencent Advertising Multimedia AI Technology: Research and Application

Liu Wei outlines Tencent’s Advertising Multimedia AI ecosystem on the Taiji platform, describing a five‑platform matrix—Jue for content understanding, Qiankun for automated video creation, Shenzhen for AI‑driven review, Tianyin for hierarchical fingerprinting, and Hunyuan as a multimodal large model—featuring innovations such as massive multimodal pre‑training, logo retrieval, QA‑style attribute extraction, spatiotemporal video analysis, advanced auto‑judgment, and high‑performance hashing that achieve top cross‑modal retrieval results.

advertising technologycomputer visioncontent understanding

0 likes · 18 min read

Tencent Advertising Multimedia AI Technology: Research and Application

IEG Growth Platform Technology Team

Feb 14, 2022 · Artificial Intelligence

Multimodal Evolution and Application in Tencent Game Advertising System

This article describes the end‑to‑end multimodal modeling pipeline—covering text, image, and video understanding, model evolution from shallow to deep networks, key‑frame extraction, fine‑tuning, and multimodal fusion—used in Tencent's game ad exchange platform, along with practical deployment challenges and solutions.

AdvertisingCNNMultimodal Learning

0 likes · 22 min read

Multimodal Evolution and Application in Tencent Game Advertising System

ByteDance SE Lab

Jul 23, 2021 · Mobile Development

How to Accurately Measure Mobile App Response Time Using Video Frame Detection and OCR

This article presents a method for precisely measuring mobile app response latency by extracting video frames, detecting start and end frames through image markers and OCR, and calculating the time difference, offering a high‑precision, customizable solution for performance evaluation across diverse app scenarios.

OCRapp latencyframe detection

0 likes · 12 min read

How to Accurately Measure Mobile App Response Time Using Video Frame Detection and OCR

DataFunTalk

Nov 22, 2020 · Artificial Intelligence

Short Video Analysis in Local Life Scenarios: Techniques and Practices at Meituan

This article presents Meituan's AI-driven short video analysis workflow, covering industry trends, multi‑label video classification, intelligent cover selection, and video generation techniques, while discussing challenges, model building, label expansion, continuous data iteration, and future outlook for video AI in local services.

AIMeituanVideo Generation

0 likes · 16 min read

Short Video Analysis in Local Life Scenarios: Techniques and Practices at Meituan

DataFunSummit

Nov 5, 2020 · Artificial Intelligence

Short Video Analysis for Local Life Scenarios: Techniques and Practices at Meituan

This article presents Meituan's AI‑driven short‑video analysis pipeline for local‑life scenarios, covering industry trends, multi‑label classification, intelligent cover selection, and video generation, and discusses model construction, label‑system expansion, continuous data iteration, and practical applications in restaurant and hotel domains.

AIMeituanVideo Generation

0 likes · 16 min read

Short Video Analysis for Local Life Scenarios: Techniques and Practices at Meituan

DataFunTalk

Oct 22, 2020 · Artificial Intelligence

Analyzing Video Excitement: Methods, Frameworks, and Applications

This article presents a comprehensive overview of video excitement analysis, covering quality, aesthetics, and narrative factors, describing a multimodal framework with supervised, weakly supervised, and multi‑task models, and illustrating practical applications such as preview generation, clipping, and automatic cover creation.

Weak Supervisioncontent recommendationexcitement detection

0 likes · 14 min read

Analyzing Video Excitement: Methods, Frameworks, and Applications

DataFunTalk

Jul 31, 2020 · Artificial Intelligence

WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation

This article details the technical architecture behind WeChat's 'Kan Kan' content understanding platform, covering text and multimedia analysis, tag extraction, entity recognition, knowledge graph construction, and how these components enhance recommendation recall, ranking, and user engagement across the ecosystem.

Knowledge GraphMachine LearningRecommendation Systems

0 likes · 46 min read

WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation

Youku Technology

Jul 29, 2020 · Artificial Intelligence

Core Technology of Video Content Understanding: Technical Practice of Partial Re-ID in Video Inspection

The talk explains how Alibaba’s Entertainment Content Operation Platform applies a Partial‑ReID algorithm to overcome the challenges of person re‑identification in heavily edited video content, enabling accurate cross‑shot character matching, richer appearance data, and metrics such as presence, interaction, and storyline for improved video quality assessment.

AIPartial Re-IDcomputer vision

0 likes · 2 min read

Core Technology of Video Content Understanding: Technical Practice of Partial Re-ID in Video Inspection

Amap Tech

Jul 9, 2020 · Artificial Intelligence

AMAP-TECH Algorithm Competition: Dynamic Road‑Condition Analysis from In‑Vehicle Video Images

Alibaba Amap’s AMAP‑TECH competition invites participants to develop AI computer‑vision models that classify real‑time road conditions—smooth, slow, or congested—from short sequences of dash‑cam images, using a labeled dataset of 1,500 training sequences and a weighted F1‑score evaluation, with cash prizes up to ¥60,000.

AIcompetitioncomputer vision

0 likes · 8 min read

AMAP-TECH Algorithm Competition: Dynamic Road‑Condition Analysis from In‑Vehicle Video Images

Youku Technology

Jun 19, 2020 · Artificial Intelligence

Video-based Temporal Event Detection Methods

In the fourth Alibaba Digital Media Technology Night Talk, algorithm engineer Liu Xiaolong presents an overview of video‑based temporal event detection, covering its problem background, representative prior works, and the latest research advances within the MEDIA AI Algorithm Challenge series.

AlibabaArtificial IntelligenceTemporal Event Detection

0 likes · 1 min read

Video-based Temporal Event Detection Methods

DataFunTalk

Apr 1, 2020 · Artificial Intelligence

Knowledge Graph‑Based Multimodal Semantic Understanding at Baidu

This article outlines Baidu's large‑scale knowledge graph applications in AI, detailing the need for multimodal semantic understanding, challenges in text and video comprehension, and the technical solutions including entity annotation, conceptization, knowledge networks, and multimodal fusion for enhanced search, recommendation, and visual question answering.

Knowledge GraphVisual Question Answeringconceptualization

0 likes · 15 min read

Knowledge Graph‑Based Multimodal Semantic Understanding at Baidu

转转QA

Nov 13, 2019 · Frontend Development

Performance Optimization of M Page: Achieving Sub‑Second Load and Zero White Screen via Video Frame Analysis

This article describes how the M page’s user‑perceived performance was dramatically improved by applying techniques such as SSR, skeleton screens, image compression, and a video‑frame analysis testing method that delivers millisecond‑level response‑time measurements, enabling sub‑second load times and eliminating white‑screen delays.

FrontendOptimizationSSR

0 likes · 5 min read

Performance Optimization of M Page: Achieving Sub‑Second Load and Zero White Screen via Video Frame Analysis

iQIYI Technical Product Team

Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

Multimodal Retrievalfeature fusioniQIYI challenge

0 likes · 7 min read

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

DataFunTalk

May 21, 2019 · Artificial Intelligence

Multimodal Video Analysis and Its Applications: Intelligent Asset Management, Automatic Cover Generation, Knowledge Graph, and Search

This article presents a comprehensive overview of Alibaba's large entertainment division research on multimodal video analysis, covering intelligent video asset management, automated cover creation with personalized distribution, video knowledge graph construction, multimodal search techniques, and future directions in AI-driven media processing.

AIKnowledge Graphcover generation

0 likes · 17 min read

Multimodal Video Analysis and Its Applications: Intelligent Asset Management, Automatic Cover Generation, Knowledge Graph, and Search

Youku Technology

May 6, 2019 · Artificial Intelligence

Exploring Intelligent Production at Youku: AI‑Driven Video Analysis and Automation

The talk describes Youku’s intelligent production platform, which uses AI and cloud computing to automatically analyze video frames, extract fine‑grained metadata such as scenes, persons, actions and scores, and then generate highlights, vertical clips, annotations and feedback for editors and upstream producers, while addressing challenges like pose‑tracking, graph‑based action classification and future plans for deeper video understanding and open competitions.

AIcomputer visionimage search

0 likes · 14 min read

Exploring Intelligent Production at Youku: AI‑Driven Video Analysis and Automation

iQIYI Technical Product Team

Apr 12, 2019 · Artificial Intelligence

iQIYI Multimodal Technology: Datasets, Applications, and Future Directions

iQIYI leverages multimodal AI—combining audio, visual, and textual cues—to advance video understanding, releasing the world’s largest celebrity dataset (iQIYI‑VID), powering applications such as actor‑focused playback, AI Radar, emoji generation, and rapid automated editing, while pursuing future research in emoji captioning, cross‑modal retrieval, visual question answering, and broader health‑care and education uses.

datasetsiQIYImultimodal AI

0 likes · 13 min read

iQIYI Multimodal Technology: Datasets, Applications, and Future Directions

DataFunTalk

Dec 16, 2018 · Artificial Intelligence

Practical Applications of Video Content Understanding at Hulu

This article details Hulu's AI-driven techniques for fine-grained video segmentation, end‑cap detection, subtitle detection and language recognition, background‑music classification, automated processing pipelines, tag generation, and cover‑image regeneration, illustrating how these methods improve user experience and operational efficiency.

AI pipelinesCNNcontent understanding

0 likes · 14 min read

Practical Applications of Video Content Understanding at Hulu

21CTO

May 8, 2018 · Artificial Intelligence

How Optical Flow Powers 360° Product Views and Advanced Vision Applications

This article explores the evolution and principles of optical flow—from early Horn‑Schunck models and Lucas‑Kanade to modern deep‑learning approaches like FlowNet—detailing its role in JD’s 360° product imaging, video detection, segmentation, view synthesis, and future research challenges in computer vision.

deep learningimage processingoptical flow

0 likes · 15 min read

How Optical Flow Powers 360° Product Views and Advanced Vision Applications

JD Tech

May 4, 2018 · Artificial Intelligence

Optical Flow: Principles, Methods, and Applications in Computer Vision

This article introduces the fundamentals and evolution of optical flow, covering classic algorithms such as Horn‑Schunck and Lucas‑Kanade, modern deep‑learning approaches like FlowNet, and their practical applications in video detection, semantic segmentation, and novel view synthesis.

CNNdeep learningimage processing

0 likes · 15 min read

Optical Flow: Principles, Methods, and Applications in Computer Vision

MaGe Linux Operations

Jul 1, 2017 · Artificial Intelligence

Detect Looping Video Frames with Perceptual Hashing in Python

This article demonstrates how to use a perceptual average‑hash algorithm in Python to identify duplicate frames in a 24‑hour video, revealing hidden loops and exposing potential video manipulation through systematic frame comparison and analysis.

PythonaHashduplicate frames

0 likes · 9 min read

Detect Looping Video Frames with Perceptual Hashing in Python