Artificial Intelligence 17 min read

AliMe MKG: Building and Applying a Multimodal Knowledge Graph for Live E‑commerce

This report details the design, construction, and deployment of AliMe MKG, a multimodal knowledge graph that powers digital‑human anchors and smart assistants in live‑stream e‑commerce by integrating triple‑level, sentence‑level, and visual knowledge through advanced AI techniques such as image‑text matching, video grounding, multimodal NER, and entity linking.

DataFunTalk

Jul 20, 2022

AliMe MKG: Building and Applying a Multimodal Knowledge Graph for Live E‑commerce

The presentation introduces AliMe MKG, a multimodal knowledge graph created to support digital‑human anchors that can livestream product introductions 24/7, reducing costs and risks for merchants while enhancing consumer engagement.

A core component is the intelligent script system, which generates multimodal scripts containing text, images, and videos; these scripts are underpinned by a multimodal KG that links scenes, pain points, user intents, and products.

The KG is organized into three knowledge types: (1) triple‑level knowledge (scene‑pain‑intent‑product), (2) sentence‑level knowledge (detailed product descriptions, usage instructions), and (3) multimodal knowledge (associated images and video clips).

AliMe MKG’s architecture consists of a pattern layer and an instance layer, with new node types added for scenes, pain points, intents, sentences, and visual media, enabling rich, logical connections between users and products.

Knowledge extraction methods include phrase mining and entity recognition for triples, pipeline extraction from micro‑blog articles, reviews, and product detail pages for sentences, and image‑text matching plus video grounding for multimodal data, employing models such as Vision Transformer, StructBERT, and dual‑stream transformers.

Recent research advances cover multimodal NER—using prompt‑driven image information extraction to boost entity recognition—and multimodal entity linking, which combines multimodal candidate retrieval with contrastive learning for disambiguation.

Applications of AliMe MKG span digital‑human anchors that deliver scripted product broadcasts and a smart‑assistant in live rooms that recommends personalized multimodal content; effectiveness is measured via conversion rates, AB testing, and offline reliability, diversity, and vividness metrics.

The talk concludes with acknowledgments and an invitation to explore the underlying papers presented at SIGIR and CIKM.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI entity linking multimodal knowledge graph video grounding e-commerce live streaming multimodal NER

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.