Knowledge Graph Construction and Applications in Alibaba B2B E‑commerce
This article explains how Alibaba B2B leverages knowledge‑graph technology—from its historical roots in knowledge engineering and expert systems to modern semantic‑web models, extraction pipelines, reasoning methods, storage solutions, and representation learning—to improve search, recommendation, and scene‑based procurement incentives in e‑commerce platforms.
The article, authored by Alibaba CBU Technology and sourced from the book "Alibaba B2B E‑commerce Algorithm Practice," introduces the role of knowledge graphs in e‑commerce platforms for precise search, recommendation, and incentivizing user procurement.
It reviews the evolution of knowledge engineering and expert systems, citing the 1977 AI conference paper that defined expert systems and the successful DEC XCON configuration system, and lists the key characteristics of expert systems.
The transition from semantic networks to knowledge graphs is described, highlighting the shift from Web 1.0 to Web 2.0 and the emergence of the semantic Web (Web 3.0). Knowledge graphs are defined as structured semantic knowledge bases composed of entity‑relation‑entity triples.
Common open knowledge graphs such as WordNet, Freebase, Yago, and OpenKG are introduced, along with vertical‑domain graphs like Alibaba's product and scene‑recommendation graphs.
The knowledge‑graph construction pipeline is outlined: knowledge extraction, reasoning, and storage, emphasizing the need for multi‑source data handling.
Knowledge extraction techniques are detailed, covering rule‑based methods, statistical machine‑learning approaches (HMM, CRF), and deep‑learning models (Bi‑LSTM‑CRF, IDCNN‑CRF, BERT‑based architectures) for entity and attribute extraction.
Relation extraction methods are categorized into template‑based and supervised learning approaches, with examples of classifiers such as SVM and CNN, and discussion of pipeline versus joint extraction frameworks.
Knowledge fusion processes, including entity linking, disambiguation, and coreference resolution, are explained as essential steps to clean and integrate extracted data.
Knowledge reasoning is divided into logical reasoning (first‑order predicate logic, description logic) and graph‑based reasoning (path ranking, graph neural embeddings).
Storage solutions are compared: triple‑table storage, type‑table storage, relational databases (MySQL, Oracle), and graph databases (Neo4j, OrientDB, GraphDB, Alibaba GDB), with examples of schema designs.
Knowledge representation learning models are surveyed, from distance models (Structured Embedding, TransE, TransH, TransR) to translation models, including loss functions and margin‑based training.
In e‑commerce applications, Alibaba's B2B product knowledge graph is described, using CPV representations to enable scene‑based recommendations, theme‑venue construction, and automated workflow integration that reduces manual effort by over 80%.
The article concludes with a thank‑you note and references to further reading.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.