iQIYI Technical Product Team
Mar 15, 2024 · Artificial Intelligence
Optimizing GPU Inference for CTR Models: Kernel Fusion, Multi‑Stream Execution, and Batch Merging
By fusing sparse‑feature operators, enabling multi‑stream execution, consolidating data copies, and merging inference batches, iQIYI reduced GPU CTR model latency to CPU‑level, boosted throughput over sixfold, and cut operational costs by more than 40%, overcoming launch‑overhead bottlenecks.
GPUKernel FusionTensorFlow
0 likes · 10 min read