Tagged articles

8 articles

Page 1 of 1

Mar 18, 2024 · Artificial Intelligence

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

The paper presents MuLTI, a multimodal video‑language model that tackles the memory and efficiency challenges of long video‑text sequences by introducing a Text‑Guided MultiWay Sampler and a Multiple Choice Modeling pre‑training task, achieving state‑of‑the‑art results on video QA and retrieval while drastically reducing GPU memory consumption.

Multimodalefficient-aifeature fusion

0 likes · 19 min read

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

NetEase LeiHuo UX Big Data Technology

Aug 11, 2022 · Artificial Intelligence

Multimodal Models: Research Directions and a Practical Case of Game Frame‑Rate Prediction

This article introduces the concept of modality, outlines the five research branches of multimodal models, and presents a concrete case where multimodal deep‑learning techniques are applied to predict and improve game frame rates using both static and temporal features.

AIMultimodalfeature fusion

0 likes · 9 min read

Multimodal Models: Research Directions and a Practical Case of Game Frame‑Rate Prediction

iQIYI Technical Product Team

Jul 19, 2019 · Artificial Intelligence

Face Quality‑Driven Feature Denoising and Fusion for iQIYI‑VID‑2019 Video Person Recognition

The seefun team leveraged face detection scores and quality metrics to denoise and weight‑fuse facial features during training and testing, using a three‑layer MLP with Swish activation and dropout, and achieved a 0.8983 mAP (fourth place) on the iQIYI‑VID‑2019 video person‑recognition challenge.

MLPface quality weightingfeature fusion

0 likes · 10 min read

Face Quality‑Driven Feature Denoising and Fusion for iQIYI‑VID‑2019 Video Person Recognition

iQIYI Technical Product Team

Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

Multimodal Retrievalfeature fusioniQIYI challenge

0 likes · 7 min read

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

iQIYI Technical Product Team

Jul 5, 2019 · Artificial Intelligence

Residual Dense Network with Feature Fusion for Multimodal Video Person Identification (iQIYI-VID-2019)

The authors introduce a feature‑fusion pipeline and a Residual Dense Net that leverages multi‑frame face embeddings to identify persons in iQIYI‑VID‑2019 videos, achieving 0.9035 mAP (second place) with only ≈0.5 GFLOPs and processing the full test set in minutes.

Multimodal Learningfeature fusioniQIYI-VID-2019

0 likes · 11 min read

Residual Dense Network with Feature Fusion for Multimodal Video Person Identification (iQIYI-VID-2019)

iQIYI Technical Product Team

Jun 28, 2019 · Artificial Intelligence

Watchdog Team's TOP1 Solution for the iQIYI & ACMMM2019 Multimodal Video Person Recognition Challenge

Watchdog team won TOP1 in iQIYI & ACMMM2019 multimodal video person recognition challenge using pre‑extracted multimodal features, a 2048‑dim classifier with BCE loss, re‑ranking, DALI‑accelerated re‑detection, fine‑tuned InsightFace, and multi‑model ensembling achieving ~91% test accuracy.

Multimodal LearningRe‑rankingfeature fusion

0 likes · 12 min read

Watchdog Team's TOP1 Solution for the iQIYI & ACMMM2019 Multimodal Video Person Recognition Challenge

iQIYI Technical Product Team

Jun 6, 2019 · Artificial Intelligence

Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos

iQIYI’s large‑scale hierarchical classification system combines multimodal text and image embeddings, low‑rank multimodal fusion, and a dense hierarchical multilabel network with cascade‑style weighting to assign accurate type tags to short videos, boosting production efficiency and personalized recommendation diversity.

AIHierarchical ClassificationMultimodal

0 likes · 16 min read

Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos

Didi Tech

May 1, 2019 · Artificial Intelligence

Didi AI Labs' DFS Face Detection Algorithm Achieves Top Rankings on the WIDER FACE Benchmark

The DFS face-detection algorithm jointly created by Didi AI Labs and Beijing University's PRIS team secured five first-place and one second-place results on the WIDER FACE benchmark, achieving 96.3% (Easy), 95.4% (Medium) and 90.7% (Hard) AP by leveraging a Feature Fusion Pyramid and semantic-segmentation supervision, and is already deployed in Didi's driver-identity verification and in-vehicle privacy systems.

WIDER FACEfeature fusionsemantic segmentation

0 likes · 5 min read

Didi AI Labs' DFS Face Detection Algorithm Achieves Top Rankings on the WIDER FACE Benchmark