Integrating Large Language Models with Search for Automotive Knowledge Retrieval
This article explores how combining traditional keyword search with large language models (LLMs) enhances understanding of user intent, builds a robust automotive knowledge base, and delivers more accurate, context‑aware answers through a multi‑stage retrieval and generation pipeline.
1. Introduction
Traditional search systems rely on keyword matching and lack the ability to understand user questions or perform secondary answer processing. This paper investigates the use of Large Language Models (LLMs) for natural language understanding (NLU) and generation (NLG) to better capture user intent and synthesize more relevant answers.
While LLMs can answer generic questions, serving a vertical domain such as the automotive industry requires deeper, more accurate, and up‑to‑date knowledge. Therefore, a dedicated automotive knowledge base was constructed.
2. Solution Analysis
2.1 Building the Basic Knowledge Base with Traditional Search
Higher answer controllability through precise keyword matching.
Suitable for common knowledge‑base scenarios such as large‑scale data handling, fast queries, and timely updates.
Lower technical risk due to mature search stacks.
LLMs act as an interaction layer between users and the search system, providing:
Understanding of user requests (error correction, key‑point extraction, clarification).
Secondary processing of search results (summarization, analysis, reasoning) while preserving correctness.
Combining both approaches optimizes knowledge‑base construction and query processing, enabling more efficient business problem handling.
2.2 Solution Design
Traditional search (Elasticsearch) handles structured data and full‑text retrieval, while embedding search (Milvus) enables similarity queries on vectorized representations. Additional steps include deduplication and relevance ranking to select the top‑K most relevant answers.
2.3 LLM Capabilities
2.3.1 LLM
Understanding user questions, including error correction and keyword extraction, and prompting for more information when needed.
Second‑stage processing of local retrieval results (summarization, inference, deeper answer generation).
Contextual interaction for configuration‑related queries (e.g., car model comparison, fuel consumption, performance).
2.3.2 Local Search System
ES Search: Full‑text retrieval of structured data (car series, keywords, etc.).
Embedding Search: Convert textual queries to vectors and query vector databases such as Milvus.
Deduplication: Remove duplicate results to avoid redundant answers.
Relevance Ranking: Rank results and select the top‑K most relevant items.
3. Implementation
3.1 Engineering Architecture
3.1.1 Common Modules
Question rewriting: Refine user input using context history.
Question understanding: Generate vectors, keywords, tags, and classifications.
Recall module: Retrieve relevant content based on intent and entities.
Ranking module: Apply a relevance model to order recalled items and extract top‑N data.
Prompt module: Select appropriate prompts for downstream generation.
LLM model: Generate dialogue content based on prompts and ranked data.
Logging module: Record end‑to‑end request logs for training, performance analysis, and case lookup.
3.1.2 Management Modules
Conversation management: Persist user Q&A history for context retention.
Prompt management: Configure prompts for different scenarios.
Plugin management: Enable channel‑specific plugins (e.g., IM user plugins).
Log management: Store, retrieve, and analyze request, performance, and result logs.
3.1.3 Knowledge Ingestion
Elasticsearch is used for structured data storage with the IK analyzer for full‑text search. Vectors generated by LLMs capture semantic similarity. The ingestion pipeline includes data import (DB, PDF, Word, etc.), preprocessing (filtering, slicing), paragraph generation, and model‑based vectorization. Milvus serves as the vector index storage.
3.2 Recall
Two recall strategies are employed:
Plain recall: Extract keywords from the user query and perform ES full‑text search to retrieve N matching records.
Embedding recall: Encode the query into a vector and retrieve N nearest neighbors from Milvus.
3.2.2 Relevance (Top‑K Selection)
An XGBoost‑based boosting model combines query, item, and relevance features (including cross features) with preprocessing (normalization, log transformation) to predict scores; the top‑K items are selected for final ranking.
3.3 LLM for the Automotive Domain
The proprietary automotive LLM “Jianjie” addresses the entire car‑ownership lifecycle, leveraging advanced NLP and deep‑learning techniques to provide accurate, comprehensive answers.
3.3.2 Training Data
Consumer reviews covering various brands and models.
Domain‑specific Q&A pairs (purchase, maintenance, repair).
Parts catalog with detailed configuration information.
Curated encyclopedia entries on automotive history, technology, and trends.
Webpages and books containing professional automotive knowledge.
3.3.3 Training Techniques
LoRA and QLoRA to reduce memory and GPU consumption.
RoPE scaling to extend context length of LLaMA models.
DPO training to simplify RLHF, replacing reward‑model + PPO pipelines.
Dataset streaming for efficient large‑scale data loading.
3.4 Evaluation
Custom automotive benchmark shows the proprietary model outperforms open‑source models in the vertical domain while matching them on general benchmarks.
3.4.1 Automotive Benchmark Items
Exam: Brand and configuration queries.
Domain sentiment analysis.
Category knowledge.
Auto‑RC and QCR‑RC reading‑comprehension tasks.
Domain‑specific QA.
3.4.2 General Benchmark Items
AFQMC (sentence‑pair matching).
OCNLI (natural language inference).
Weibo (text classification).
C3, CMRC, Harder‑RC (reading comprehension).
General QA covering knowledge, writing, reasoning, entertainment.
Some hallucination and format issues were observed, prompting ongoing iteration.
3.5 Result Integration
Summarization and abstraction of retrieved results.
Formatting for consistent presentation.
Deduplication and optional translation.
Context analysis to produce personalized, accurate answers.
4. Example Demonstrations
Sample queries such as “What are the dimensions of the 2023 Mercedes‑Benz GLC260?” or “What is the fuel consumption of a BMW 3 Series?” are answered with generated responses and accompanying images.
5. Conclusion
The LLM + search architecture merges intent understanding, intelligent retrieval, and result enhancement, reducing model hallucination and delivering more precise, context‑aware answers for automotive queries.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.