Construction of a Second‑Hand E‑commerce Knowledge Graph and Its Application in Pricing Models
This article explains how a knowledge graph for second‑hand e‑commerce is built—from data extraction and entity, attribute, and relation mining to ontology construction, entity alignment, and graph integration—and describes how the resulting graph supports personalized recommendation, search optimization, and statistical or regression‑based pricing models.
1. Overview of Knowledge Graphs
This presentation is divided into four parts: an overview of knowledge graphs, the construction process, the specific second‑hand e‑commerce knowledge graph, and its use in pricing models.
A knowledge graph is a network‑structured repository that represents real‑world entities, their attributes, and the relationships among them, originally introduced by Google in 2012 to improve search.
The basic components of a knowledge graph are entities, attributes, and relationships, often expressed as triples (entity‑relationship‑entity or entity‑attribute‑value). Special entities called ontologies or concepts group similar entities together.
2. Knowledge Graph Construction
Knowledge graphs can be open‑domain (e.g., Google’s) or vertical domain‑specific (e.g., finance, e‑commerce). Construction starts with data processing: structured data from databases, semi‑structured or unstructured data such as product titles, descriptions, and images.
Key extraction tasks include named entity recognition (NER) for entities, attribute extraction, and relation extraction, using rule‑based, machine‑learning, or deep‑learning methods. Subsequent steps involve entity alignment (e.g., matching Chinese and English names) and entity disambiguation (e.g., distinguishing "Apple" the fruit from "Apple" the company).
Ontology extraction creates higher‑level concepts (e.g., "company"), and similarity calculations help cluster related entities. After quality assessment, the graph is refined and may be expanded through knowledge inference.
3. Second‑Hand E‑commerce Knowledge Graph
The graph addresses four aspects: business understanding, graph design, algorithms, and implementation.
Characteristics of second‑hand e‑commerce include sparse, noisy data, diverse item conditions, and significant price variance, which necessitate specialized attribute extraction such as item condition, appearance, and refurbishment status.
Construction proceeds by first building a product‑level tree‑shaped graph, extracting term‑level entities, then K‑V (key‑value) attributes, and finally structuring these into a tag‑tree ontology.
Item‑level understanding involves extracting "item words" (core product terms) and "tag words" (detailed attributes) from structured data, titles, descriptions, and images, supporting personalized recommendation and ranking features.
Tag‑tree structuring aggregates discrete key‑value pairs into a hierarchical ontology, enabling query understanding and intelligent search.
Product anchoring matches products to ontology nodes using classification, title, description, and image data, assigning weights to resolve ambiguities and generate complete entity representations.
4. Application in Pricing Models
The graph supports price estimation for second‑hand items by identifying price‑sensitive attributes, creating "standardized" second‑hand product IDs, and estimating price intervals using statistical methods (e.g., Box‑Cox transformation, normal‑distribution based interval calculation) or regression models that embed graph‑derived features.
For non‑standardized items, the approach finds the top‑N similar items in the graph, aggregates their sale prices, removes outliers, and derives a price range through inverse transformations.
Overall, the knowledge graph enables both data‑driven statistical pricing and machine‑learning‑based regression pricing for second‑hand e‑commerce.
Author Introduction
Zhang Qingnan, Algorithm Architect, leads the foundational model team at Zhuanzhuan, with prior experience as a senior recommendation algorithm engineer at Dangdang.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.