Artificial Intelligence 22 min read

Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction

This tutorial demonstrates how to use OpenMLDB together with OneFlow to build an end‑to‑end machine‑learning pipeline for predicting high‑potential JD users' purchase intent, covering environment setup, data loading, SQL table creation, offline feature extraction, DeepFM model training, model serving, online feature extraction, deployment, and real‑time inference.

DataFunTalk
DataFunTalk
DataFunTalk
Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction

This guide walks through constructing a full machine‑learning workflow that predicts high‑potential JD user purchase intent by integrating OpenMLDB for data processing and OneFlow for model training and serving.

Environment preparation : Ensure a host with an Nvidia GPU (driver ≥ 460) and install OneFlow with conda activate oneflow and python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/support_oneembedding_serving/cu102 . Install additional Python packages: pip install psutil petastorm pandas sklearn and pip install tritonclient xxhash geventhttpclient .

OpenMLDB Docker setup : Pull and run the OpenMLDB image, mapping the demo directory to /root/project (e.g., docker run -dit --name=demo --network=host -v $demodir:/root/project 4pdosc/openmldb:0.5.2 bash ), then start the CLI with /work/openmldb/bin/openmldb --zk_cluster=127.0.0.1:2181 --zk_root_path=/openmldb --role=sql_client .

Database and table creation (executed in the OpenMLDB CLI): > CREATE DATABASE JD_db; > USE JD_db; > CREATE TABLE action(...); > CREATE TABLE flattenRequest(...); > CREATE TABLE bo_user(...); > CREATE TABLE bo_action(...); > CREATE TABLE bo_product(...); > CREATE TABLE bo_comment(...);

Offline data loading : Switch to offline mode ( SET @@execute_mode='offline'; ) and load Parquet files into the tables with LOAD DATA INFILE '/root/project/data/JD_data/action/*.parquet' INTO TABLE action options(format='parquet', header=true, mode='append'); (similar commands for other tables). Use SHOW JOBS to ensure loading finishes.

Feature engineering : Write and execute a complex SQL query that joins the tables, defines windows, and computes features such as distinct counts, top‑1 ratios, and day‑of‑week flags. The query outputs results to /root/project/out/1 for later training.

Data preprocessing for DeepFM : Run the script process_JD_out_full.sh $demodir/out/1 to convert the feature output into Parquet format required by OneFlow, printing sample counts and table‑size arrays.

Model training : Configure training parameters in train_deepfm.sh (batch size, learning rate, embedding size, etc.) and launch training with OneFlow’s distributed launcher. The trained model is saved to $demodir/oneflow_process/model_out and the serving model to $demodir/oneflow_process/model/embedding/1/model .

Model serving : Start Triton Inference Server with the OneFlow backend, mounting the model directory and required libraries, e.g., docker run --runtime=nvidia --rm --network=host -v $demodir/oneflow_process/model:/models -v /path/to/libtriton_oneflow.so:/backends/oneflow/libtriton_oneflow.so -v $demodir/oneflow_process/persistent:/root/demo/persistent registry.cn-beijing.aliyuncs.com/oneflow/triton-devel bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib /opt/tritonserver/bin/tritonserver --model-repository=/models --backend-directory=/backends' .

Online feature extraction and deployment : Restart the OpenMLDB CLI in online mode, load fresh data, and deploy the same feature‑extraction SQL as a service using deploy demo select * from (...) . Verify deployment with show deployment demo; .

Real‑time inference : Start the OpenMLDB prediction server ( ./start_predict_server.sh 0.0.0.0:9080 ) and send a request via the provided predict.py script, which queries OpenMLDB for features, calls the OneFlow model through Triton, and prints the predicted purchase‑intent score.

Optional appendix details the compilation of OneFlow with One Embedding support and building the Triton OneFlow backend inside a Docker container, including CMake commands and library path adjustments.

DockerSQLOneFlowModelServingdeepFMOpenMLDBFeatureEngineering
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.