Knowledge‑Enhanced Product Understanding with Meituan Brain: Building and Applying a Large‑Scale Product Knowledge Graph
This talk presents Meituan Brain's massive product knowledge graph, explains how knowledge‑enhanced models improve product title parsing, category association and sample governance, and demonstrates the resulting gains in search, recommendation and other downstream services while keeping the system online‑controllable and explainable.
Meituan Brain is an ongoing effort to construct the world’s largest knowledge graph for the life‑service domain, focusing on product information. By integrating AI techniques, the graph provides structured data such as brand, flavor, origin and hierarchical categories, which are essential for new‑retail scenarios like delivery, flash‑sale, grocery and pharmacy.
The presentation first introduces the overall architecture of the product knowledge graph, describing the hierarchical and attribute layers, the corpus construction pipeline (corpus → sample collection → model training → prediction → graph generation), and the importance of sample collection and model training for knowledge‑enhanced learning.
For product title parsing, a baseline BERT+CRF model is insufficient due to domain‑specific vocabularies, ambiguous terms, and noisy annotations. Various lexical‑enhancement methods (soft lexicon, LexBert) are evaluated, showing limited improvement. The authors then propose graph‑based knowledge enhancement: incorporating graph nodes as a lexicon, encoding type‑specific embeddings, and fusing relational information via graph neural networks. To achieve online controllability, a graph‑anchor approach is introduced, using category knowledge as anchors to compute relevance scores between words and products, enabling explicit disambiguation and statistical‑feature‑based interventions.
Experiments demonstrate that the graph‑anchor method yields the largest performance boost (+4 PP) compared with soft lexicon (+1 PP) and LexBert (+1.5 PP). The method also supports online adjustments, making the model more explainable and easier to maintain.
The talk then covers product‑category association, where the task is to decide whether a candidate category is an “is‑A” relation for a given product. By leveraging the knowledge graph’s hierarchical and synonym relations, the authors improve disambiguation and achieve higher accuracy.
Sample governance is presented as a two‑stage loop: (1) active‑learning‑driven data sampling, where model confidence is calibrated with label smoothing and representative samples are selected via clustering; (2) error‑sample detection, using cross‑validation consistency, forgetting‑count statistics, and multi‑task auxiliary detectors to identify mislabeled instances. These techniques reduce annotation cost while steadily improving model quality.
Finally, the application of the product graph is illustrated in search (query understanding, recall filtering), ranking (feature augmentation to demote irrelevant items), and product presentation (filter options, recommendation reasons, tags, leaderboards). The authors report consistent gains across these downstream tasks and emphasize the graph’s role in a virtuous cycle of better models and richer knowledge.
In summary, the knowledge‑enhanced approach significantly boosts model performance, provides interpretability, and enables online controllability, making it well‑suited for industrial AI systems.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.