Tag

Allreduce

1 views collected around this technical thread.

Didi Tech
Didi Tech
Jun 8, 2018 · Artificial Intelligence

DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning

DiDi PS is a custom RDMA‑based parameter server that uses a ring topology and optimized ibverbs communication to dramatically accelerate distributed deep‑learning training, consistently outperforming OpenMPI, NCCL2, TensorFlow’s built‑in RDMA, and Horovod while providing more stable and scalable synchronization for massive data workloads.

AllreduceRDMATensorFlow
0 likes · 10 min read
DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning