Dec 21, 2018 · Artificial Intelligence

CPU-Based Optimization of Deep Learning Inference Services

To alleviate GPU scarcity, iQIYI’s cloud platform migrated deep‑learning inference to CPUs and applied system‑level (MKL‑DNN, OpenVINO), application‑level, and algorithm‑level optimizations—tuning threads, batch size, NUMA, pruning and quantization—delivering 1‑9× speedups across thousands of cores while preserving latency and accuracy.

CPUInference OptimizationMKL-DNN

0 likes · 14 min read

CPU-Based Optimization of Deep Learning Inference Services