Tag

MXNet

0 views collected around this technical thread.

360 Tech Engineering
360 Tech Engineering
May 10, 2019 · Artificial Intelligence

Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow

This article explains how MXNet supports data‑parallel training on single‑machine multi‑GPU and multi‑machine multi‑GPU setups, describes KVStore modes, outlines the worker‑server‑scheduler architecture, and shows how to launch large‑scale distributed training using Kubeflow and the mxnet‑operator.

Data ParallelGPUKVStore
0 likes · 11 min read
Distributed Training with MXNet: Data Parallel on Single and Multi‑Node GPUs and Integration with Kubeflow
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
May 9, 2019 · Artificial Intelligence

Master Distributed MXNet Training with Kubeflow: A Step‑by‑Step Guide

Learn how to perform single‑machine multi‑GPU and multi‑node multi‑GPU training with MXNet, understand KVStore modes, configure workers, servers, and schedulers, and deploy large‑scale distributed training on Kubernetes using Kubeflow, including operator installation, task creation, and performance considerations.

GPUKubeflowKubernetes
0 likes · 11 min read
Master Distributed MXNet Training with Kubeflow: A Step‑by‑Step Guide