Artificial Intelligence 15 min read

Design and Implementation of a High‑Availability Distributed Machine Learning Model Online Inference System

This article presents a comprehensive technical solution for a distributed online inference system that packages machine‑learning models in Docker containers, orchestrates them with Kubernetes for fault‑tolerant, elastic scaling, and integrates model repositories, image registries, monitoring, and automated model selection to streamline deployment, updates, and resource management.

JD Tech Talk

Sep 17, 2020

Design and Implementation of a High‑Availability Distributed Machine Learning Model Online Inference System

With the rapid development of big data and AI, many business scenarios such as financial risk control, online advertising, recommendation, and smart cities increasingly rely on machine‑learning models, which after training must be packaged, deployed, and served online to solve real‑world problems.

The paper proposes a complete technical scheme for a distributed machine‑learning model online inference system. It uses CPU/GPU compute nodes as the basic inference power, Docker containers to encapsulate model inference tasks, and Kubernetes for service orchestration, providing distributed fault tolerance and elastic resource scaling. Integrated modules such as a model repository, container image repository, monitoring, service registration/discovery, and visual dashboards decouple algorithms from service architecture, simplifying deployment, updates, and management while improving stability, flexibility, and service capability.

Existing deployment methods—direct deployment on physical machines, virtual machines, or containerized services—suffer from repeated environment setup, resource conflicts, low availability, and cumbersome manual updates. These issues motivate the need for a more automated, scalable solution.

The proposed high‑availability system follows a modular design: (A) Model Service Designer for visual configuration; (B) Model Repository for versioned model storage; (C) Container Image Repository for pre‑built runtime environments; (D) Model Microservice Engine that pulls models and images, wraps them as containerized services; (E) Kubernetes Cluster for scheduling and high‑availability; (F) Underlying infrastructure (CPU/GPU clusters, Ceph/HDFS); (G) Service Management for lifecycle operations; (H) Load Balancer; (I) Monitoring Module; (J) Monitoring Dashboard.

Automation of model selection and updates is achieved through five strategy templates (data‑driven, accuracy‑driven, periodic best‑performance, threshold‑based, and manual selection), allowing seamless model upgrades during low‑traffic periods.

For resource elasticity, the system monitors real‑time metrics (CPU/GPU usage, memory, latency) and computes the desired number of container instances using a weighted formula, then leverages Kubernetes Horizontal Pod Autoscaling (HPA) to adjust resources dynamically, reducing waste while meeting service demand.

In conclusion, the solution delivers a container‑based, Kubernetes‑orchestrated, fault‑tolerant, and elastically scalable model inference platform that simplifies deployment and management, automates model selection and updates, and optimizes resource utilization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Machine Learning AI Kubernetes Model Serving resource scaling

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.