Meituan's To‑Store Comprehensive Knowledge Graph: Construction, Applications, and Future Directions
Meituan's To‑Store Comprehensive Knowledge Graph (GENE) centralizes user demand nodes across diverse local‑life industries, detailing its multi‑layered construction, data mining pipelines, model‑driven entity and relationship extraction, and practical applications in search, recommendation, and intelligent display, while outlining future expansion plans.
Meituan's to‑store comprehensive business covers many local‑life categories such as leisure, beauty, parenting, wedding, and pets. To improve supply‑demand matching efficiency, Meituan built a user‑demand‑centric knowledge graph (GENE) that links users, merchants, products, and content.
The presentation is divided into four parts: an overview of Meituan's to‑store business, the design and construction of the knowledge graph, practical applications, and future outlook.
1. To‑Store Business Overview
Meituan provides a wide range of local services, focusing on the user’s in‑store consumption scenario. User decision making is modeled in five stages—interest, consideration, evaluation, purchase, and fulfillment—where the first two stages generate scene‑level and concrete demands that must be linked to merchants and products.
Knowledge graphs are ideal for connecting these demand nodes with supply entities, forming the backbone of the GENE system.
2. Knowledge Graph Construction Scheme
The graph consists of six layers: scene‑demand layer, scene‑element layer, concrete‑demand layer, demand‑object layer, industry‑system layer, and supply layer.
Construction Challenges
User demand diversity: a multi‑dimensional, hierarchical graph schema was designed to capture varied needs.
Complexity of local‑life industries: over a hundred industries require reusable extraction pipelines and few‑shot learning to accelerate graph building.
High quality requirements: multi‑source, multi‑modal data and a combination of mining methods ensure node and relation accuracy.
Industry‑System Layer
A pre‑existing category tree is enriched with expert‑defined attributes (e.g., repurchase cycle, distance preference). Multi‑source data (merchant name, product description, UGC, merchant portrait) are encoded using BERT, Doc2Vec + Self‑Attention, and one‑hot embeddings, then fused for accurate category classification.
Demand‑Object Layer
Objects and attributes are mined in coarse‑grained (keyword extraction → clustering → dimension refinement) and fine‑grained stages, using unsupervised expansion (word‑vector similarity) and supervised BERT + CRF labeling. Statistical features and BERT embeddings are later combined to infer hierarchical and synonym relations.
Concrete‑Demand Layer
Candidate concrete demands are generated via pattern‑based composition (e.g., “outdoor BBQ”) and phrase mining (dependency parsing, AutoPhrase). A Wide & Deep model (global statistics + BERT semantics) judges candidate quality before manual review.
Relations between concrete demands and attributes are built using pattern extraction and a BERT‑based sentence‑level relation extractor.
Linking concrete demands to supply (merchants, products, content) is treated as an entity‑linking problem with recall, ranking, and aggregation stages, using BERT sentence‑pair classification and active learning.
Scene‑Element Layer
Scene elements (person, time, place, purpose) are extracted from UGC using the same unsupervised + supervised pipeline as demand objects, then linked to concrete demands via template extraction and BERT relation modeling.
Scene‑Demand Layer
Scene demands are assembled from one or more scene elements (e.g., “weekend + friends” → “outdoor gathering”). A compatibility scoring model filters out contradictory combinations.
3. Data Accumulation
After more than a year of development, the graph covers 60+ industries, contains over 400 k demand nodes, billions of edges, and dozens of relation types, with precision and recall above 90 %.
4. Application Practice
The graph is applied in search (recall and explainability, e.g., medical‑beauty queries), recommendation (recall and ranking, integrated into Meituan’s “You May Like” streams), and intelligent information display (supply aggregation, tag filtering, and recommendation reasons).
New Industry Exploration – Script‑Murder (Jùběn Shā)
For the rapidly growing script‑murder market, the graph standardizes supply by extracting script names, attributes, categories, and linking them to merchants and content through rule‑based, semantic, and multimodal (text + image) matching pipelines.
Results include a new script‑murder category, improved recommendation quality, and richer information display (tags, ranking, script cards).
5. Future Outlook
Future work will extend the graph from supply‑centric to user‑centric by adding user nodes, deepen industry coverage, expand to the full decision‑making chain (including fulfillment), and continue improving knowledge representation and computation for broader applications.
Q&A
Q: How are manually defined triple templates combined with algorithmic extraction? A: High‑quality template triples are directly stored and also serve as training samples for models, achieving >95 % accuracy.
Q: How long does it take to expand to a new industry? A: It varies with industry complexity; the established pipeline accelerates the process but exact timing depends on domain specifics.
Q: Are tag recalls offline? How does the graph contribute to recall? A: Both offline label generation and online graph services are used; tag recall leverages demand nodes and their supply links, improving CTR across multiple verticals.
Thank you for attending.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.