Artificial Intelligence 11 min read

dl_inference: Open‑Source General Deep Learning Inference Service

dl_inference is an open‑source inference platform that simplifies deployment of TensorFlow and PyTorch models in production, offering unified gRPC access, load‑balanced multi‑node serving, GPU/CPU options, customizable pre‑ and post‑processing, and extensible architecture for future AI workloads.

58 Tech

Mar 27, 2020

dl_inference: Open‑Source General Deep Learning Inference Service

dl_inference is an open‑source general deep‑learning inference service launched by 58.com, designed to quickly bring TensorFlow and PyTorch models into production.

Project details

GitHub repository: https://github.com/wuba/dl_inference

Supports both GPU and CPU deployment modes.

Handles multi‑node deployment with dynamic weighted round‑robin load balancing, serving over a billion online requests daily.

Key features

Simplifies deployment of deep‑learning model inference services.

Supports multi‑node deployment with built‑in load‑balancing.

Provides a unified RPC service interface.

Offers both GPU and CPU deployment options.

For PyTorch models, includes pre‑ and post‑processing and open model invocation.

Architecture

The system consists of three modules: a unified access service (gRPC entry point), a TensorFlow inference service, and a PyTorch inference service. The unified service defines common interfaces for both frameworks and performs dynamic load balancing based on node health.

TensorFlow inference

Uses TensorFlow Serving (Docker or bare‑metal) to serve SavedModel files, supports hot model updates, gRPC/REST APIs, and can be extended with custom operators by recompiling TensorFlow‑Serving.

PyTorch inference

Since PyTorch lacks a native serving component, dl_inference wraps PyTorch models with Seldon, exposing a gRPC SeldonMessage protocol. It provides optional pre‑ and post‑processing scripts and allows custom model execution logic.

Deployment steps

Both TensorFlow and PyTorch models are deployed via Docker containers. For TensorFlow, prepare a SavedModel, pull the TensorFlow‑Serving image, mount the model directory, and run the container. For PyTorch, place the model file ( model.pth) and custom interface scripts in a directory, build the provided Dockerfile, and start the service with the supplied script.

Future roadmap

Support Caffe models on GPU and CPU.

Accelerate CPU inference using Intel MKL, OpenVINO, etc.

Accelerate GPU inference with NVIDIA TensorRT.

Contribution & feedback

Contributions are welcomed via pull requests or issues on the GitHub repository, or by emailing [email protected] .

Authors

Feng Yu – Senior Backend Engineer, AI Lab, 58.com Chen Xingzhen – Backend Architect, AI Lab, 58.com Chen Zelong – Backend Engineer, AI Lab, 58.com

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning Open Source TensorFlow AI inference PyTorch Model Serving dl_inference

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.