Tag

MPI

0 views collected around this technical thread.

Architects' Tech Alliance
Architects' Tech Alliance
Apr 17, 2023 · Fundamentals

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

This article provides a comprehensive overview of high‑performance computing, covering system architectures, hardware components, performance metrics, network topologies, common parallel file systems, cluster management functions, mainstream job‑scheduling systems, and MPI‑based parallel programming models.

HPCHigh Performance ComputingJob Scheduling
0 likes · 14 min read
Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models
Architects' Tech Alliance
Architects' Tech Alliance
May 3, 2022 · Fundamentals

High‑Performance Computing Overview and Resource Guide

This article provides a comprehensive overview of high‑performance computing (HPC), covering its definition, hardware architectures, performance metrics, cluster components, parallel file systems, management and scheduling tools, as well as common MPI implementations and links to further technical resources.

FLOPSFile SystemsHPC
0 likes · 11 min read
High‑Performance Computing Overview and Resource Guide
DataFunSummit
DataFunSummit
Nov 29, 2021 · Artificial Intelligence

Horovod Distributed Training Plugin: Design, Usage, and Deadlock Prevention

This article reviews Horovod, a popular third‑party distributed deep‑learning training plugin, explaining its simple three‑line integration, the challenges of deadlocks in all‑reduce operations, and the architectural components—including background threads, coordinators, and MPI/Gloo controllers—that enable scalable and efficient data‑parallel training.

Data ParallelDeep LearningGloo
0 likes · 8 min read
Horovod Distributed Training Plugin: Design, Usage, and Deadlock Prevention
Tencent Cloud Developer
Tencent Cloud Developer
May 22, 2020 · Artificial Intelligence

Distributed Training for WeChat Scan-to-Identify Using Horovod, MPI, and NCCL

WeChat’s Scan‑to‑Identify system now trains its CNN models across multiple GPUs using Horovod’s data‑parallel, synchronous Ring All‑Reduce architecture built on MPI and NCCL, cutting training time from several days to under one day while maintaining accuracy, and future work will target I/O and further scaling.

AIDeep LearningHorovod
0 likes · 12 min read
Distributed Training for WeChat Scan-to-Identify Using Horovod, MPI, and NCCL
Efficient Ops
Efficient Ops
Jun 25, 2015 · Big Data

Inside Baidu’s 8‑Year Evolution of Hadoop and Distributed Computing

This article chronicles Baidu’s eight‑year journey from early Hadoop adoption to advanced MPI, DAG engines, and real‑time streaming platforms, detailing architectural milestones, performance optimizations, and practical lessons for large‑scale offline and online data processing.

BaiduDAGHadoop
0 likes · 21 min read
Inside Baidu’s 8‑Year Evolution of Hadoop and Distributed Computing