Artificial Intelligence 13 min read

Industry Search: Background, Technologies, and Real‑World Applications

This article presents a comprehensive overview of industry search, covering its background, core retrieval and ranking technologies—including sparse and dense retrieval, pre‑trained language models, tokenization, NER, adaptive multi‑task training, and re‑ranking models—followed by detailed case studies such as address analysis, family‑ID unification, emergency call handling, education photo‑search, and power‑knowledge‑base integration.

DataFunTalk
DataFunTalk
DataFunTalk
Industry Search: Background, Technologies, and Real‑World Applications

The presentation introduces industry search, beginning with its background and the need to bridge user information needs with large resource repositories, illustrated through e‑commerce query examples.

It explains the fundamental search pipeline, distinguishing sparse retrieval (keyword‑based inverted indexes) from dense retrieval (pre‑trained language‑model embeddings), and outlines the typical stages of recall, coarse ranking, fine ranking, and re‑ranking.

A comparison between consumer‑internet and industrial‑internet search highlights differences in user volume, performance requirements, and algorithmic focus, noting that industrial search often emphasizes recall and relevance over conversion metrics.

The article then surveys related research, describing Alibaba DAMO Academy’s AliceMind hierarchical pre‑training framework, advances in tokenization for task‑specific NLP, cross‑domain unsupervised tokenization, and improvements in named‑entity recognition (NER) for short queries.

It introduces adaptive multi‑task training (MOMETAS) to reduce inference cost by sharing a single encoder across tasks, and presents the ROM pre‑training model for retrieval‑enhanced embeddings, which achieves state‑of‑the‑art results on MS MARCO.

The HLATR re‑ranking model is described as a list‑aware Transformer that fuses multiple classifier outputs, delivering significant gains in the final ranking stage.

Several industry applications are detailed: (1) an address‑analysis product that builds a knowledge graph and a multimodal geographic language model; (2) a Family‑ID system that normalizes disparate address representations to unify customer identities; (3) an emergency‑call solution that combines ASR, query correction, intent detection, and multi‑stage search to pinpoint incident locations; (4) an education photo‑search system that integrates OCR, spell correction, subject prediction, multimodal intent understanding, and vector‑based retrieval to solve exam‑question queries; and (5) a power‑knowledge‑base unified search that handles semi‑structured data, AI‑driven document processing, and downstream QA.

The talk concludes by emphasizing the maturity of these search pipelines and their deployment across numerous Chinese cities and industry sectors.

multimodalNLPsearch rankingretrievalpretrained modelsaddress analysisindustry search
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.