Databases 24 min read

OpenMLDB 0.4.0 Full-Process Features and Quick‑Start Guide for Building End‑to‑End Online AI Applications

This article introduces the new full‑process features of OpenMLDB 0.4.0, explains its unified online/offline storage, high‑availability task management, and end‑to‑end AI workflow, and provides step‑by‑step instructions for quickly deploying both single‑node and cluster versions to run a complete online AI application.

DataFunTalk
DataFunTalk
DataFunTalk
OpenMLDB 0.4.0 Full-Process Features and Quick‑Start Guide for Building End‑to‑End Online AI Applications

OpenMLDB 0.4.0 brings extensive performance optimizations, including LLVM‑based JIT for various CPU architectures, and a multi‑level skip‑list storage engine that greatly improves read/write speed for time‑series data.

The release adds three major new capabilities: (1) unified online and offline storage with a single table view, (2) a high‑availability offline task manager that supports Spark‑based jobs via extended SQL commands, and (3) a complete end‑to‑end AI workflow that can be driven by SDK or CLI.

The article walks through three parts: new full‑process features, quick‑start procedures for the single‑node and cluster editions, and a hands‑on workshop that builds an end‑to‑end AI application using a Kaggle taxi‑duration dataset.

For the single‑node version, a pre‑compiled binary or Docker image can be started with a script that launches the NameServer, API Server, and a single Tablet. The cluster edition adds ZooKeeper for metadata management, multiple Tablets for replication, and the TaskManager service for fault‑tolerant job execution.

Users can create databases and tables via standard ANSI‑SQL, import offline data (CSV/Parquet) by switching to offline mode, perform feature extraction with window functions, export the resulting feature set, train models with LightGBM (or TensorFlow/PyTorch), and then deploy the feature‑SQL as an online service.

After deployment, online data can be “water‑filled” into the in‑memory store, and a lightweight HTTP/RPC server provides real‑time predictions using the trained model, achieving sub‑10‑ms latency for time‑series window aggregations.

The article concludes with a summary of the ten steps required to launch a full‑process AI application and invites readers to join the open‑source community on GitHub.

SQLDistributedSystemsTimeSeriesDatabaseOpenMLDBFeatureEngineeringAIWorkflow
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.