Artificial Intelligence 16 min read

Integrating Machine Learning Ranking into Elasticsearch: Architecture, Components, and Performance

The team embedded a full machine‑learning ranking pipeline as an Elasticsearch plug‑in—combining real‑time and offline feature stores, hot‑loadable model jars via Dragonfly, an MLeap execution engine, and a DSL for feature definition—replacing the coarse‑ranking logistic‑regression with a tree model that adds ~10 ms latency but yields a 1.2 % AB‑test lift, while maintaining high throughput, low CPU usage, and supporting future batch deep‑learning rescoring.

HelloTech
HelloTech
HelloTech
Integrating Machine Learning Ranking into Elasticsearch: Architecture, Components, and Performance

This article describes how a machine‑learning‑driven ranking capability was migrated into Elasticsearch (ES) to build a high‑performance, distributed search‑ranking system for the Haolo four‑wheel driver‑passenger matching scenario. By replacing the coarse‑ranking Logistic Regression (LR) model with a tree‑based model, the team achieved a 1.2% AB test lift.

Background

The team is responsible for recall and ranking in a car‑pool matching service. The existing pipeline follows a classic coarse‑ranking → fine‑ranking → re‑ranking cascade. Coarse ranking uses LR on thousands of orders, fine ranking selects the top 300 for deeper scoring, and re‑ranking picks the final 10.

To improve recall quality, the coarse‑ranking stage was upgraded from LR to a tree model that can run fully inside ES, eliminating the previous 300‑order limitation and enabling programmatic configuration of the model.

Overall Solution

The solution embeds the entire ML workflow as a plug‑in inside ES, consisting of several key components:

Feature acquisition (real‑time, context, offline, and composite features)

KKV system for low‑latency offline feature retrieval

Hot‑load mechanism to update plug‑ins without restarting the cluster

Execution engine (based on MLeap) that loads and predicts models of various types (tree, deep learning, custom)

File distribution system (Dragonfly) for synchronising model, KV, and JAR files across the cluster

Feature generation DSL for declarative feature configuration

Debug API to expose per‑item prediction details

Machine‑Learning Online Prediction Flow

Features are collected as inputs, fed to the model, and a prediction score is returned. Feature types include real‑time (e.g., driver features from Flink), context, offline (loaded via KKV), and composite features generated by a custom DSL.

Key Components

KKV System : Provides online access to offline features with minute‑level freshness. It uses mmap memory‑mapping to handle billions of feature entries efficiently.

Hot‑Load : Abstracts ES script plug‑ins so that new JARs can be loaded on‑the‑fly via Dragonfly, avoiding full cluster restarts.

Execution Engine : Uses MLeap to serialize Spark‑ML models and serve them via a unified API. Supports both Spark‑ML and deep‑learning models.

File Distribution System (Dragonfly) : Listens for file changes, downloads updated files, verifies MD5, and triggers business callbacks. Example annotation usage:

// File listener
@Dragonfly(storagePath = "model/lo_cc_deep_model_v1.tar.gz")
public void getDeepFMModel(File file, String path) throws IOException {
    super.getDeepFMModel(file, path);
}

// Multiple callbacks
@Dragonfly(storagePath = "model/lo_cc_deep_model_v1.tar.gz")
public void test(File file, String path) throws IOException {
    System.out.println("单体测试3");
}

// Directory monitoring
@Dragonfly(dirPathMonitor = "model/deep", filterBean = ApolloFilter.class)
public void getDeepFMModel(File file, String path) throws IOException {
    super.getDeepFMModel(file, path);
}

Debug : Exposes detailed prediction information via the ES ScoreScript API.

explanation.set(String.format("mleap(policyName=%s,params=%s,detail=%s)", this.sortPolicy, params, tempMap));

Stability Measures

Comprehensive load‑testing before release

Extreme stress tests on model, plugin, and feature changes

Gray‑release strategy using Dragonfly to control rollout per model or JAR

Model Loading Grouping

Different business models load only the required feature subsets by assigning indices to specific ES groups, reducing unnecessary resource consumption.

Performance Acceleration

Request cache for user features

Object reuse across rows

Model cache with pre‑allocated input/output schemas

Global cache via mmap for high‑frequency features

Business Impact After Deployment

Support for all Spark models

Zero‑code model iteration through feature configuration, hot‑load, load‑testing, and gray‑release

Modular, pluggable algorithm components enabling multi‑stage ranking

Hot‑load ensures JAR updates without service jitter

Flame‑graph shows ranking consumes only ~7% CPU on a single core

Tree model adds ~10 ms latency compared to LR, but yields a 1.2% AB lift in the core car‑pool scenario

Future Work

The team plans to address ES’s row‑by‑row limitation for deep‑learning models by developing a rescore plug‑in that can batch TensorFlow sessions. Integration with OpenVINO is also under investigation to provide a unified inference API across TensorFlow, PyTorch, PaddlePaddle, and Caffe.

The ultimate goal is to complete end‑to‑end ranking inside ES, achieving a “search‑to‑rank‑in‑one” system with maximal performance and flexibility.

Feature EngineeringModel Deploymentmachine learningonline predictionelasticsearchranking
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.