Artificial Intelligence 7 min read

TensorRT Acceleration and Integration Design for the 58 AI Platform (WPAI)

This article explains how the 58 AI platform leverages NVIDIA TensorRT to accelerate deep‑learning inference on GPUs, describes three integration approaches, details the TF‑TRT implementation and Kubernetes deployment, and presents performance gains for ResNet‑50 and OCR models.

58 Tech

Nov 6, 2019

TensorRT Acceleration and Integration Design for the 58 AI Platform (WPAI)

The 58 AI platform (WPAI) provides a one‑stop solution for algorithm development, supporting both machine‑learning and deep‑learning pipelines, with GPU/CPU debugging, offline training, and online inference capabilities.

TensorRT (TRT) is NVIDIA's CUDA‑based inference engine that optimizes models for GPU execution, supporting major frameworks such as TensorFlow, PyTorch, Caffe2, and MXNet, but only for inference.

TRT improves GPU inference by optimizing the computation graph, converting precision to FP16/INT8, selecting optimal CUDA kernels, and managing GPU memory more efficiently.

Three integration methods are described: (1) using TRT built‑into frameworks like TensorFlow (TF‑TRT), (2) exporting models to intermediate formats (e.g., ONNX) and importing them into TRT, and (3) constructing the network directly with TRT’s C++/Python API. The WPAI platform adopts the first method (TF‑TRT).

The TF‑TRT workflow converts a TensorFlow SavedModel into an optimized TRT engine, which is then served via TensorFlow‑Serving in a Kubernetes environment; an InitContainer performs the conversion and stores the optimized model in an emptyDir volume for the main container.

Performance tests on an NVIDIA P40 GPU show that TF‑TRT speeds up ResNet‑50‑v1 inference by 1.8× in FP32, 3.2× in INT8, and reduces latency for an OCR detection model by 45% while increasing QPS by 62%.

The article concludes that TF‑TRT delivers significant gains for image classification and object detection models, though benefits vary with model structure, and future work will add native TRT support to the platform.

Reference: NVIDIA TensorRT official documentation.

Author: Chen Xingzhen, AI Lab backend architect at 58.com.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model optimization GPU inference TensorRT AI Platform Kubernetes deployment TF-TRT

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.