AliMe MKG: Building and Applying a Multimodal Knowledge Graph for Live E‑commerce
This report details the design, construction, and deployment of AliMe MKG, a multimodal knowledge graph that powers digital‑human anchors and smart assistants in live‑stream e‑commerce by integrating triple‑level, sentence‑level, and visual knowledge through advanced AI techniques such as image‑text matching, video grounding, multimodal NER, and entity linking.
The presentation introduces AliMe MKG, a multimodal knowledge graph created to support digital‑human anchors that can livestream product introductions 24/7, reducing costs and risks for merchants while enhancing consumer engagement.
A core component is the intelligent script system, which generates multimodal scripts containing text, images, and videos; these scripts are underpinned by a multimodal KG that links scenes, pain points, user intents, and products.
The KG is organized into three knowledge types: (1) triple‑level knowledge (scene‑pain‑intent‑product), (2) sentence‑level knowledge (detailed product descriptions, usage instructions), and (3) multimodal knowledge (associated images and video clips).
AliMe MKG’s architecture consists of a pattern layer and an instance layer, with new node types added for scenes, pain points, intents, sentences, and visual media, enabling rich, logical connections between users and products.
Knowledge extraction methods include phrase mining and entity recognition for triples, pipeline extraction from micro‑blog articles, reviews, and product detail pages for sentences, and image‑text matching plus video grounding for multimodal data, employing models such as Vision Transformer, StructBERT, and dual‑stream transformers.
Recent research advances cover multimodal NER—using prompt‑driven image information extraction to boost entity recognition—and multimodal entity linking, which combines multimodal candidate retrieval with contrastive learning for disambiguation.
Applications of AliMe MKG span digital‑human anchors that deliver scripted product broadcasts and a smart‑assistant in live rooms that recommends personalized multimodal content; effectiveness is measured via conversion rates, AB testing, and offline reliability, diversity, and vividness metrics.
The talk concludes with acknowledgments and an invitation to explore the underlying papers presented at SIGIR and CIKM.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.