Tag

KVStore

0 views collected around this technical thread.

360 Tech Engineering
360 Tech Engineering
May 10, 2019 · Artificial Intelligence

Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow

This article explains how MXNet supports data‑parallel training on single‑machine multi‑GPU and multi‑machine multi‑GPU setups, describes KVStore modes, outlines the worker‑server‑scheduler architecture, and shows how to launch large‑scale distributed training using Kubeflow and the mxnet‑operator.

Data ParallelGPUKVStore
0 likes · 11 min read
Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow