Integrating Machine Learning Ranking into Elasticsearch: Architecture, Components, and Performance
The team embedded a full machine‑learning ranking pipeline as an Elasticsearch plug‑in—combining real‑time and offline feature stores, hot‑loadable model jars via Dragonfly, an MLeap execution engine, and a DSL for feature definition—replacing the coarse‑ranking logistic‑regression with a tree model that adds ~10 ms latency but yields a 1.2 % AB‑test lift, while maintaining high throughput, low CPU usage, and supporting future batch deep‑learning rescoring.
This article describes how a machine‑learning‑driven ranking capability was migrated into Elasticsearch (ES) to build a high‑performance, distributed search‑ranking system for the Haolo four‑wheel driver‑passenger matching scenario. By replacing the coarse‑ranking Logistic Regression (LR) model with a tree‑based model, the team achieved a 1.2% AB test lift.
Background
The team is responsible for recall and ranking in a car‑pool matching service. The existing pipeline follows a classic coarse‑ranking → fine‑ranking → re‑ranking cascade. Coarse ranking uses LR on thousands of orders, fine ranking selects the top 300 for deeper scoring, and re‑ranking picks the final 10.
To improve recall quality, the coarse‑ranking stage was upgraded from LR to a tree model that can run fully inside ES, eliminating the previous 300‑order limitation and enabling programmatic configuration of the model.
Overall Solution
The solution embeds the entire ML workflow as a plug‑in inside ES, consisting of several key components:
Feature acquisition (real‑time, context, offline, and composite features)
KKV system for low‑latency offline feature retrieval
Hot‑load mechanism to update plug‑ins without restarting the cluster
Execution engine (based on MLeap) that loads and predicts models of various types (tree, deep learning, custom)
File distribution system (Dragonfly) for synchronising model, KV, and JAR files across the cluster
Feature generation DSL for declarative feature configuration
Debug API to expose per‑item prediction details
Machine‑Learning Online Prediction Flow
Features are collected as inputs, fed to the model, and a prediction score is returned. Feature types include real‑time (e.g., driver features from Flink), context, offline (loaded via KKV), and composite features generated by a custom DSL.
Key Components
KKV System : Provides online access to offline features with minute‑level freshness. It uses mmap memory‑mapping to handle billions of feature entries efficiently.
Hot‑Load : Abstracts ES script plug‑ins so that new JARs can be loaded on‑the‑fly via Dragonfly, avoiding full cluster restarts.
Execution Engine : Uses MLeap to serialize Spark‑ML models and serve them via a unified API. Supports both Spark‑ML and deep‑learning models.
File Distribution System (Dragonfly) : Listens for file changes, downloads updated files, verifies MD5, and triggers business callbacks. Example annotation usage:
// File listener
@Dragonfly(storagePath = "model/lo_cc_deep_model_v1.tar.gz")
public void getDeepFMModel(File file, String path) throws IOException {
super.getDeepFMModel(file, path);
}
// Multiple callbacks
@Dragonfly(storagePath = "model/lo_cc_deep_model_v1.tar.gz")
public void test(File file, String path) throws IOException {
System.out.println("单体测试3");
}
// Directory monitoring
@Dragonfly(dirPathMonitor = "model/deep", filterBean = ApolloFilter.class)
public void getDeepFMModel(File file, String path) throws IOException {
super.getDeepFMModel(file, path);
}Debug : Exposes detailed prediction information via the ES ScoreScript API.
explanation.set(String.format("mleap(policyName=%s,params=%s,detail=%s)", this.sortPolicy, params, tempMap));Stability Measures
Comprehensive load‑testing before release
Extreme stress tests on model, plugin, and feature changes
Gray‑release strategy using Dragonfly to control rollout per model or JAR
Model Loading Grouping
Different business models load only the required feature subsets by assigning indices to specific ES groups, reducing unnecessary resource consumption.
Performance Acceleration
Request cache for user features
Object reuse across rows
Model cache with pre‑allocated input/output schemas
Global cache via mmap for high‑frequency features
Business Impact After Deployment
Support for all Spark models
Zero‑code model iteration through feature configuration, hot‑load, load‑testing, and gray‑release
Modular, pluggable algorithm components enabling multi‑stage ranking
Hot‑load ensures JAR updates without service jitter
Flame‑graph shows ranking consumes only ~7% CPU on a single core
Tree model adds ~10 ms latency compared to LR, but yields a 1.2% AB lift in the core car‑pool scenario
Future Work
The team plans to address ES’s row‑by‑row limitation for deep‑learning models by developing a rescore plug‑in that can batch TensorFlow sessions. Integration with OpenVINO is also under investigation to provide a unified inference API across TensorFlow, PyTorch, PaddlePaddle, and Caffe.
The ultimate goal is to complete end‑to‑end ranking inside ES, achieving a “search‑to‑rank‑in‑one” system with maximal performance and flexibility.
HelloTech
Official Hello technology account, sharing tech insights and developments.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.