OneSearch‑V2 Launches: Self‑Distilled Generative Search That Truly Understands Users
OneSearch‑V2 introduces a latent‑reasoning enhanced self‑distillation framework that augments query understanding with thought‑augmented CoT, aligns preferences via direct user behavior feedback, and achieves up to 4 % CTR lift and significant order growth without adding inference cost or latency.
Background
In large‑scale e‑commerce search, three major bottlenecks limit generative retrieval: (1) insufficient understanding of complex or ambiguous queries, (2) weak personalization of user intent, and (3) reward models that over‑fit narrow historical preferences.
OneSearch V1 Review
OneSearch V1 reduced inference cost and improved performance on mid‑frequency queries, but its end‑to‑end generation struggled with short ambiguous queries and long‑tail variations. Although it contributed ~8% of conversions, it accounted for one‑third of page views.
Limitations of V1
Complex query understanding gaps (e.g., “indoor fitness equipment” vs. specific items).
Inadequate personalization; the model relied heavily on historical co‑occurrence.
Reward system prone to distribution shift and over‑fitting.
OneSearch V2 Design
Latent Reasoning‑Enhanced Self‑Distillation (LR‑ESD)
The LR‑ESD pipeline encodes keyword‑level chain‑of‑thought (CoT) reasoning into model weights. A teacher model receives the full query augmented with high‑density keywords, while a student receives the raw query. An information‑asymmetric KL loss forces the student to reproduce the teacher’s predictions, internalizing reasoning without extra parameters or inference tokens.
Thought‑Augmented Query Understanding
A three‑step pipeline processes each query:
Intent, category, and attribute extraction.
Keyword extraction using an LLM‑generated, high‑density keyword‑based CoT.
Preference calibration with user‑profile signals.
This “thought‑augmented” stage avoids long‑text generation, preserving semantic richness while keeping latency low.
Behavior‑Feedback Preference Alignment
Instead of a separate reward model, V2 directly incorporates real user actions (clicks, purchases) into a composite reward. The Token‑Position Marginal Advantage (TPMA‑GRPO) distributes credit across the hierarchical SID sequence, giving higher weight to early coarse tokens and aligning generation with downstream conversion signals.
Experimental Evaluation
Offline Results
On a 30 k query test set (≈30 000 PVs, 7 229 orders), V2 achieved Order HR@10 = 0.2314 and Click HR@10 = 0.2568, a 2.68 % absolute HR lift over V1. Ablation showed self‑distillation contributed the largest gain (Order HR@10 + 1.17 %, Click HR@10 + 1.67 %). Adding R‑Drop, FGM, and focal loss yielded synergistic improvements, reaching Order HR@10 = 0.2214 and Click HR@10 = 0.2471.
Online A/B Test Results
Deploying OneSearch‑V2 on the full platform increased overall CTR by 3.98 %, buyer count by 2.07 %, and order volume by 2.11 % without any increase in latency. Incremental deployments (RAG, Reasoning, Full) showed monotonic improvements.
Human Evaluation
Manual assessment of 3 200 query‑item pairs reported +1.37 % page quality, +0.55 % item relevance, and +1.65 % query‑item relevance.
In‑Depth Analysis
User/Query Frequency and Cold‑Start
V2 outperformed V1 across all user activity tiers, query frequencies, and item popularity levels. Gains were most pronounced for low‑activity users and cold‑start items, turning V1’s inverted‑U CTR curve into a balanced U‑shape.
Industry‑Level CTR Gains
Average CTR uplift was 3.98 % across all verticals. Categories with rich but ambiguous titles (apparel, cosmetics, hardware) saw the largest relative improvements, confirming the effectiveness of CoT‑driven semantic enhancement.
CoT Keyword Coverage
Online pipelines continuously refresh the CoT keyword repository. Coverage of complex queries rose steadily during March 2026, ensuring stable self‑distillation updates.
Relevance‑Conversion Trade‑off
TPMA‑GRPO balances relevance rewards with post‑hoc conversion signals. The RAG‑only variant improved relevance but lagged in conversion; the full V2 model achieved higher CTCVR (0.242 % vs. 0.231 %) by internalizing reasoning rather than merely boosting relevance scores.
TPMA Flexibility in Promotions
During the 3.18 shopping festival, TPMA was used to give higher relevance rewards to emerging merchants, successfully elevating their rankings and demonstrating real‑time, business‑driven reward adjustment.
Future Directions
Beyond‑log training strategies for long‑tail queries with scarce interaction data.
Unified SID encoding that jointly represents text, images, video, and live‑stream content.
Agentic search systems with efficient online learning that preserve latency guarantees.
Paper and Code
Full technical details are available in the arXiv paper: https://arxiv.org/abs/2603.24422
Implementation can be explored at the GitHub repository: https://github.com/benchen4395/onesearch-family
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
