Artificial Intelligence 12 min read

Optimizing Question‑Answer Search Similarity in Haodf Online: A Semantic Similarity Model Case Study

This article describes how Haodf Online improved its medical question‑answer search by analyzing search challenges, adopting semantic similarity models based on pre‑trained language embeddings, designing contrastive training tasks, and evaluating the resulting increase in click‑through rate and user engagement.

HaoDF Tech Team

Sep 15, 2021

Optimizing Question‑Answer Search Similarity in Haodf Online: A Semantic Similarity Model Case Study

With the rapid advancement of natural language processing, many previously difficult automation tasks are now applied to online services, providing efficient experiences. The article records Haodf Online's exploration of optimizing question‑answer search similarity.

In a search engine, recall and ranking are crucial; the core problems include understanding user intent, identifying truly relevant documents, and ensuring trustworthy content.

Haodf's search faces specific challenges: medical queries are highly descriptive, often colloquial, and the corpus is rigorously vetted, limiting noisy data. Traditional BM25 relies on term matching and synonym dictionaries, which struggle with misspellings, vague expressions, and low‑frequency medical terms.

The goal is to develop a relevance scoring model that takes two sentences or documents as input and outputs a similarity score, allowing semantically close results to be ranked higher (e.g., "hand swelling pain" vs. "hand palm swelling pain").

Recent NLP breakthroughs enable encoding text into vectors; models have evolved from Word2Vec to large pre‑trained models like FLAN. However, large models are costly for online inference, so Haodf explores smaller models with knowledge distillation (DistilBERT, AutoTinyBERT) and inference optimizations (TVM).

Training tasks progress from pointwise to pairwise and listwise ranking, with contrastive learning approaches such as SimCSE and sentence‑BERT. Additional modules enforce domain‑specific entity alignment and topic consistency.

Initial experiments using a modest dataset produced a model that distinguishes descriptive medical texts, but issues remained with rare drug names and noisy query terms.

To strengthen the model, Haodf designed new unsupervised and semi‑supervised tasks inspired by MUM and other works, including entity detection, replacement, correction, and rewrite contrastive tasks, employing losses like am‑softmax and KL‑regularized dropout.

After deploying the optimized similarity model, the click‑through rate for Q&A search results increased by 4.6%, and the average user interaction length grew by 8.5%, indicating better relevance and user satisfaction.

Future work will continue to enrich medical knowledge bases, refine data and model capabilities, and address remaining challenges in semantic similarity for medical search.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model optimization natural language processing sentence embeddings search relevance semantic similarity medical AI

Written by

HaoDF Tech Team

HaoDF Online tech practice and sharing—join us to discuss and help create quality healthcare through technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.