Learning to Rank: From Regression to Search Ranking and Evaluation Methods
Learning to rank reframes search as a machine‑learning problem that optimizes document ordering rather than numeric prediction, using relevance metrics such as NDCG and feature‑based scoring functions, and comparing point‑wise, pair‑wise (RankSVM) and list‑wise (ListNet) approaches while stressing that proper error definition and feature selection matter more than the specific algorithm.
Learning to rank (LTR) treats search as a machine‑learning problem. This article explains why search differs from classic regression or classification tasks and how to evaluate LTR methods.
Measuring Search Quality
Unlike regression where error is the residual between actual and predicted values, search quality is measured by how close the returned order is to an ideal ranking. Metrics such as NDCG (Normalized Discounted Cumulative Gain) capture both relevance and position bias, ranging from 0 (worst) to 1 (best). Other metrics like ERR and MAP exist, but NDCG is used as the primary relevance indicator here.
Generating a Ranking Function with Machine Learning
Traditional regression learns a function f(x) that predicts a numeric target. In LTR we learn a ranking function f(d, q) that assigns a score to each document d for a query q . The goal is to maximize NDCG over a set of training queries.
Typical features for e‑commerce search include:
TF‑IDF score of the query term in the product title: titleScore(d, q)
TF‑IDF score in the product description: descScore(d, q)
Number of units sold: numSold(d)
A possible scoring formula learned from data is:
f(d, q) = 10 * titleScore(d, q) + 2 * descScore(d, q) + 5 * numSold(d)
Three major families of LTR algorithms are discussed:
Point‑wise (single‑document) learning to rank
Each query‑document pair is given a relevance grade (e.g., 4 for highly relevant, 0 for irrelevant). The model predicts the grade using regression and minimizes the residual y - f(d, q) . This approach ignores the ordering of documents and therefore often fails to capture position bias.
List‑wise and Pair‑wise methods
These approaches directly optimize the ordering. List‑wise methods (e.g., ListNet) define a loss based on the probability of the entire permutation matching the ideal order. Pair‑wise methods (e.g., RankSVM) treat each document pair as a binary classification problem, encouraging the higher‑ranked document to receive a larger score.
ListNet Example (Python‑style pseudocode)
def error(): error = 0 for each query: for each doc: error -= TopOneP(doc.grade) * log(TopOneP(f(doc, query))) return error
Here TopOneP denotes the probability that a document is ranked first given its score.
RankSVM
RankSVM constructs feature differences between document pairs and trains a linear SVM to classify which document should be ranked higher. For example, if dress_shoes sold 5000 units and meh_dress_shoes sold 1000, the pairwise feature is (1, 4000) where 1 is the “better” label.
Both RankSVM and ListNet have trade‑offs: RankSVM is simple and fast but ignores position bias; ListNet captures position bias but is computationally heavier.
Conclusion
The key takeaway is that regardless of the chosen model, understanding what error to minimize and selecting appropriate features are more critical than the specific algorithm.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.