Tagged articles
313 articles
Page 4 of 4
DataFunTalk
DataFunTalk
Feb 9, 2021 · Artificial Intelligence

Multimodal AI Research: Video-Aware Dialog, Dual-Channel Reasoning, and Multimodal Machine Translation

This article surveys recent multimodal AI research, covering video scene‑aware dialog with a GPT‑2 based unified pre‑training framework, dual‑channel multi‑hop reasoning for visual dialog, capsule‑network‑enhanced multimodal machine translation, and graph‑neural‑network‑driven multimodal translation, highlighting experimental results and future directions.

Graph Neural NetworkMultimodal Learningmachine translation
0 likes · 12 min read
Multimodal AI Research: Video-Aware Dialog, Dual-Channel Reasoning, and Multimodal Machine Translation
JD Tech
JD Tech
Feb 2, 2021 · Artificial Intelligence

Advances and Trends in Multimodal Digital Content Generation and Automatic Text Summarization

The article reviews recent research on multimodal digital content generation and automatic text summarization, outlining the evolution from extractive to abstractive methods, highlighting four key technology trends such as pretrained language models, transformer dominance, knowledge‑enhanced generation, and multimodal‑knowledge joint modeling, and describing an industrial e‑commerce application built on these advances.

Generative Modelse-commerceknowledge integration
0 likes · 12 min read
Advances and Trends in Multimodal Digital Content Generation and Automatic Text Summarization
DataFunTalk
DataFunTalk
Oct 22, 2020 · Artificial Intelligence

Analyzing Video Excitement: Methods, Frameworks, and Applications

This article presents a comprehensive overview of video excitement analysis, covering quality, aesthetics, and narrative factors, describing a multimodal framework with supervised, weakly supervised, and multi‑task models, and illustrating practical applications such as preview generation, clipping, and automatic cover creation.

Weak Supervisioncontent recommendationexcitement detection
0 likes · 14 min read
Analyzing Video Excitement: Methods, Frameworks, and Applications
Meituan Technology Team
Meituan Technology Team
Oct 15, 2020 · Artificial Intelligence

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

The paper introduces the Answer‑Driven Visual State Estimator (ADVSE), which uses answer‑driven focusing attention and conditional visual information fusion to dynamically incorporate answers into visual dialogue, overcoming static encoding limitations and achieving state‑of‑the‑art performance on the GuessWhat?! question‑generation and guessing tasks.

State Estimationattention mechanismgoal-oriented
0 likes · 10 min read
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue
DataFunTalk
DataFunTalk
Jul 31, 2020 · Artificial Intelligence

WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation

This article details the technical architecture behind WeChat's 'Kan Kan' content understanding platform, covering text and multimedia analysis, tag extraction, entity recognition, knowledge graph construction, and how these components enhance recommendation recall, ranking, and user engagement across the ecosystem.

Machine LearningRecommendation Systemscontent understanding
0 likes · 46 min read
WeChat 'Kan Kan' Content Understanding: Architecture and Techniques for Recommendation
Xianyu Technology
Xianyu Technology
Jul 9, 2020 · Product Management

Xianyu Product Structuring: Evolution, Current Strategies, and Future Directions

Xianyu’s product‑information structuring has progressed from simple text mining to multimodal AI pipelines that now boost coverage by nearly 50 %, while facing precision and engineering hurdles, and it plans to adopt a standardized VID attribute system, plug‑in multimodal models, and rule‑based input assistance to enable seamless, photo‑driven publishing.

Rule Enginedata engineeringe-commerce
0 likes · 10 min read
Xianyu Product Structuring: Evolution, Current Strategies, and Future Directions
Suning Technology
Suning Technology
Apr 9, 2020 · Artificial Intelligence

Affective Computing in Retail: Boosting Customer Experience with Emotion AI

This article explores the development and application of affective computing in the retail sector, covering its psychological foundations, emotion recognition algorithms for facial expressions, speech, and text, multimodal fusion techniques, market players, and future prospects for enhancing shopper experiences, staff service quality, and sales performance.

Affective ComputingCustomer ExperienceEmotion Recognition
0 likes · 20 min read
Affective Computing in Retail: Boosting Customer Experience with Emotion AI
JD Tech Talk
JD Tech Talk
Mar 9, 2020 · Artificial Intelligence

Advances in Deep Learning for Content Recommendation and User Behavior Modeling by JD Digits

The article reviews recent deep‑learning breakthroughs in personalized content recommendation, covering news and e‑commerce systems, JD Digits' multi‑dimensional user behavior prediction models, knowledge‑graph meta‑learning, and the impact of multimodal AI on future recommendation technologies.

Recommendation Systemsdeep learningknowledge graph
0 likes · 6 min read
Advances in Deep Learning for Content Recommendation and User Behavior Modeling by JD Digits
Amap Tech
Amap Tech
Dec 6, 2019 · Artificial Intelligence

Semantic Understanding of Merchant Signboards for Automatic POI Name Generation at Amap

Amap's POI naming automation uses a two-stage cascade model: Stage 1 extracts token and sentence features with POS tags and domain-adapted BERT‑POI; Stage 2 employs a Bi‑LSTM to model line relationships, achieving over 95% semantic accuracy and 3‑6% recall improvements, thereby enhancing automatic signboard‑based POI name generation.

BERTLSTMName Generation
0 likes · 7 min read
Semantic Understanding of Merchant Signboards for Automatic POI Name Generation at Amap
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 12, 2019 · Artificial Intelligence

iQIYI Multimodal Technology: Datasets, Applications, and Future Directions

iQIYI leverages multimodal AI—combining audio, visual, and textual cues—to advance video understanding, releasing the world’s largest celebrity dataset (iQIYI‑VID), powering applications such as actor‑focused playback, AI Radar, emoji generation, and rapid automated editing, while pursuing future research in emoji captioning, cross‑modal retrieval, visual question answering, and broader health‑care and education uses.

datasetsiQIYImultimodal AI
0 likes · 13 min read
iQIYI Multimodal Technology: Datasets, Applications, and Future Directions
Youku Technology
Youku Technology
Apr 2, 2019 · Artificial Intelligence

How Youku Uses Multimodal AI for Video Understanding, Search, and Recommendation

Youku’s Algorithm Center has built a multimodal AI pipeline that jointly processes visual, audio, and textual signals to enhance video search, recommendation, and digital asset management, overcoming traditional keyword limits, improving relevance and cold‑start issues, while tackling fusion, cost, and interpretability challenges.

Recommendation Systemscontent understandingmedia analytics
0 likes · 15 min read
How Youku Uses Multimodal AI for Video Understanding, Search, and Recommendation
JD Tech
JD Tech
Aug 14, 2018 · Artificial Intelligence

GCN‑LSTM Image Captioning Model by JD AI Research Institute

JD AI Research Institute presented a GCN‑LSTM encoder‑decoder system that integrates object semantic and spatial relationships via graph convolutional networks to significantly improve image captioning performance on the COCO benchmark, achieving state‑of‑the‑art results.

COCO datasetImage CaptioningLSTM
0 likes · 7 min read
GCN‑LSTM Image Captioning Model by JD AI Research Institute
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 7, 2017 · Artificial Intelligence

How Alibaba’s AI Powers Voice Ticketing and Facial Recognition in Shanghai Metro

Alibaba’s AI-driven solutions enable Shanghai Metro passengers to buy tickets by simply speaking, recognize faces at turnstiles, and analyze crowd flow in real time, showcasing multimodal voice‑vision interaction, far‑field speech recognition in noisy stations, and advanced computer‑vision techniques.

Smart Transitfacial recognitionmultimodal AI
0 likes · 10 min read
How Alibaba’s AI Powers Voice Ticketing and Facial Recognition in Shanghai Metro