Enhancing Commercial Search with Knowledge Graphs and Large‑Model Techniques

This article describes how a commercial search platform iteratively upgrades its system by structuring business knowledge into a knowledge graph, applying multi‑stage entity extraction (CRF, Electra‑CRF, GLM‑3, OCR), and leveraging large language models to improve relevance, user experience, and revenue.

58 Tech

Sep 23, 2024

Enhancing Commercial Search with Knowledge Graphs and Large‑Model Techniques

As commercial search matures, deeper business understanding is required to boost user experience and revenue. The article first outlines the business scenarios of the 58.com platform, highlighting user entry points, intent analysis, and the differences among life‑service, housing, and recruitment searches.

Key challenges identified include unstructured merchant information and insufficient relevance of search results, caused by high‑recall but low‑precision term‑based and vector‑based recall pipelines.

The proposed solution is to structure business knowledge. A multi‑layer knowledge graph is built, consisting of classification, atomic, composite, and user‑demand layers, to represent services, job types, and guarantees.

Entity extraction evolves through four stages: (1) quick validation using CRF on hot queries (≈75% accuracy), (2) batch extraction with Electra‑CRF (≈90% accuracy, 100k+ entities), (3) large‑model extraction using GLM‑3 fine‑tuned on service‑entity prompts (≈92% accuracy, handling non‑contiguous entities), and (4) multimodal extraction via PaddleOCR on merchant images (yielding a 20% UV increase).

Large models further upgrade the pipeline: they provide stronger semantic understanding, few‑shot learning, and a generative paradigm suitable for NER and promise extraction. Two upgrade targets are identified—service‑entity NER and service‑promise extraction—both benefiting from GLM‑3 fine‑tuning.

For service‑entity extraction, a prompt is crafted to guide the model to output verb‑object service phrases; the prompt is shown below:

prompt:
你是一个出色的文本关键词提炼工具,我是一个寻找本地生活服务的用户,正在从商家描述中寻找商家可提供的服务项目。商家服务项目是商家可提供的服务内容,可以解决我遇到的生活难题,如"同城搬家"、"附近开锁";商家服务项目不是商家提供服务项目时提供的服务承诺或者商家的服务特色。请你根据上述我对商家服务项目的定义,帮助我从商家文本提取商家服务项目关键词。
要求：
1)充分理解商家的描述内容,遵从商家描述原文描述;
2)禁止输出重复的商家服务项目关键词;
3)禁止输出商家的服务承诺，如"上门服务"、"安全承诺";
4)输出的词是由动宾结构组成的复合短语，每个短语都描述了一种具体的服务或行为，如"维修自动门"、"同城搬家"、"道路救援";
5)输出格式要求:输出的服务项目使用","分割,并以"关键词"开头输出;
6)禁止输出地名的词，如"成都"。
回答后,不需要输出你回答内容的原因。
以下是一个示例，以便你更好地理解我的诉求:
示例:
以下是一个商家描述:
高价回收黄金首饰、铂金钯金、足金回收千足金、金条项链手镯耳环戒指、钻戒。支持上门回收丨黄金免费上门无上门费 免费鉴定 不压价国际行情价黄金、金银首饰回收【黄金】1、黄金回收：千足金、足金、24k、22k金、20k金、18k金、万足金、14k金、金条、金砖等所有黄金。
关键词：
回收黄金首饰、回收铂金、回收钯金、回收千足金、回收金条、回收项链、回收手镯、回收耳环、回收戒指、回收钻戒、回收黄金、回收金银首饰、足金回收、金条回收、金砖回收
以下是一个商家描述:{商家描述}
response：{人工标注服务实体}

Service‑promise extraction follows a similar pipeline, using a chain‑of‑thought approach to first summarize long descriptions and then extract promise phrases, achieving >80% accuracy after class‑specific fine‑tuning.

The structured knowledge graph is applied across multiple downstream modules: recall (entity‑based inverted index), ranking (entity relevance scores), relevance filtering, creative highlighting, and OCPC bidding. Deployment of the full pipeline resulted in a 6%–10% increase in cash/UV metrics and a daily commercial gain of over 100 k.

Finally, the article lists references to BERT, ELECTRA, NER surveys, and GLM‑3, and provides a brief author bio.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Large Language Model NLP knowledge graph entity extraction Search commercial search

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.