Tag

deep learning compiler

0 views collected around this technical thread.

JD Tech
JD Tech
Mar 18, 2024 · Artificial Intelligence

High‑Performance Inference Architecture: Distributed Graph Heterogeneous Computing Framework and GPU Multi‑Stream Optimization

The article describes how JD’s advertising team tackled the high‑concurrency, low‑latency challenges of online recommendation inference by designing a distributed graph heterogeneous computing framework, optimizing GPU kernel launches with TensorBatch, deep‑learning compiler techniques, and a multi‑stream GPU architecture, achieving significant throughput and latency improvements.

AI inferenceGPU optimizationHigh Performance
0 likes · 14 min read
High‑Performance Inference Architecture: Distributed Graph Heterogeneous Computing Framework and GPU Multi‑Stream Optimization
JD Retail Technology
JD Retail Technology
Jan 25, 2024 · Artificial Intelligence

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

This article describes how JD Retail's advertising technology team tackled the high‑compute demands of modern recommendation models by designing a distributed graph‑partitioned heterogeneous computing framework, introducing TensorBatch request aggregation, leveraging deep‑learning compiler bucketing and asynchronous compilation, and implementing a multi‑stream GPU architecture to dramatically improve online inference throughput and latency.

GPU AccelerationOnline InferenceRecommendation systems
0 likes · 13 min read
Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration