Artificial Intelligence 7 min read

TensorRT Acceleration and Integration Design for the 58 AI Platform (WPAI)

This article explains how the 58 AI platform leverages NVIDIA TensorRT to accelerate deep‑learning inference on GPUs, describes three integration approaches, details the TF‑TRT implementation and Kubernetes deployment, and presents performance gains for ResNet‑50 and OCR models.

58 Tech
58 Tech
58 Tech
TensorRT Acceleration and Integration Design for the 58 AI Platform (WPAI)

The 58 AI platform (WPAI) provides a one‑stop solution for algorithm development, supporting both machine‑learning and deep‑learning pipelines, with GPU/CPU debugging, offline training, and online inference capabilities.

TensorRT (TRT) is NVIDIA's CUDA‑based inference engine that optimizes models for GPU execution, supporting major frameworks such as TensorFlow, PyTorch, Caffe2, and MXNet, but only for inference.

TRT improves GPU inference by optimizing the computation graph, converting precision to FP16/INT8, selecting optimal CUDA kernels, and managing GPU memory more efficiently.

Three integration methods are described: (1) using TRT built‑into frameworks like TensorFlow (TF‑TRT), (2) exporting models to intermediate formats (e.g., ONNX) and importing them into TRT, and (3) constructing the network directly with TRT’s C++/Python API. The WPAI platform adopts the first method (TF‑TRT).

The TF‑TRT workflow converts a TensorFlow SavedModel into an optimized TRT engine, which is then served via TensorFlow‑Serving in a Kubernetes environment; an InitContainer performs the conversion and stores the optimized model in an emptyDir volume for the main container.

Performance tests on an NVIDIA P40 GPU show that TF‑TRT speeds up ResNet‑50‑v1 inference by 1.8× in FP32, 3.2× in INT8, and reduces latency for an OCR detection model by 45% while increasing QPS by 62%.

The article concludes that TF‑TRT delivers significant gains for image classification and object detection models, though benefits vary with model structure, and future work will add native TRT support to the platform.

Reference: NVIDIA TensorRT official documentation.

Author: Chen Xingzhen, AI Lab backend architect at 58.com.

model optimizationGPU inferenceTensorRTAI PlatformKubernetes deploymentTF-TRT
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.