Databases 16 min read

OpenMLDB: An Open‑Source Machine Learning Database for Consistent Online and Offline Feature Serving

This article presents OpenMLDB, an open‑source machine learning database that unifies offline and online feature computation with millisecond‑level latency, outlines its development history, architecture, recent 0.6.0 enhancements, ecosystem integrations, and multiple real‑world deployment case studies across finance, banking, research, and marketing domains.

DataFunTalk
DataFunTalk
DataFunTalk
OpenMLDB: An Open‑Source Machine Learning Database for Consistent Online and Offline Feature Serving

The talk by senior architect Lu Mian at the 4Paradigm Technology Day introduced OpenMLDB, an open‑source machine learning database that provides a consistent production feature platform for both online and offline environments.

The community celebrated over 100 contributors, showing steady growth in the open‑source ecosystem since its first anniversary.

OpenMLDB was officially open‑sourced in June of the previous year, building on four to five years of internal development dating back to 2017; it has released six versions, published a VLDB paper, and attracted its first enterprise customer, Akulaku, with additional feedback from JD Tech and other industry users.

At its core, OpenMLDB offers an end‑to‑end machine‑learning pipeline that delivers millisecond‑level real‑time feature computation, essential for AI‑driven decisions, by bridging offline feature scripts written in Python or SparkSQL with an online low‑latency execution engine using a unified SQL interface.

The architecture exposes a single programming language—SQL—and includes two processing engines: a batch engine (Spark‑optimized for feature computation) and a real‑time SQL engine (a custom time‑series database) that together ensure consistent execution plans and enable one‑click deployment from offline development to online serving.

Version 0.6.0 adds several key enhancements: an intelligent diagnosis tool with service‑status checks and automatic log collection; integration with Apache Airflow; extensive SQL syntax improvements such as EXCLUDE CURRENT_ROW , DELETE , pre‑aggregation with _where suffixes, and new built‑in functions char(int) , char_length , character_length , radians , hex , median ; and two deployment modes (in‑memory and RocksDB‑based) that reduce cost. OpenMLDB also integrates with upstream data sources like Kafka, Pulsar, HDFS, S3 and downstream monitoring tools such as Prometheus and Grafana.

Several customer case studies illustrate its impact: Akulaku processes nearly one billion orders per day with 4 ms latency; a bank’s fraud‑detection system achieves sub‑20 ms feature computation; a national research institute handles 3 M QPS writes and 3 k QPS queries for IoT data; a financial services firm reduces resource usage by 2‑3×; an internet company attains millisecond‑level risk control; and MuShang Group builds an intelligent marketing platform leveraging OpenMLDB for personalized recommendation, ad targeting, and churn prediction.

The article concludes with an invitation to join the OpenMLDB open‑source community, providing links to documentation, GitHub repositories, Airflow resources, and a WeChat discussion group.

real-timemachine learningSQLAIDatabaseFeature StoreOpenMLDB
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.