Fundamentals 14 min read

Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

This article provides a comprehensive overview of high‑performance computing, covering system architectures, hardware components, performance metrics, network topologies, common parallel file systems, cluster management functions, mainstream job‑scheduling systems, and MPI‑based parallel programming models.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Overview of High‑Performance Computing (HPC): Architecture, Metrics, Cluster Management, Job Scheduling, and Parallel Programming Models

High‑Performance Computing (HPC) refers to computing systems that combine many processors—either as part of a single machine or as nodes in a cluster—to deliver large‑scale computational resources.

Most cluster‑based HPC systems use high‑performance network interconnects such as InfiniBand or Myrinet; simple bus topologies are possible, but mesh networks provide lower latency and higher throughput in high‑performance environments.

High‑Performance Computing Hardware and Overall Structure

Typical HPC hardware includes compute nodes, I/O nodes, login nodes, management nodes, high‑speed networks, and storage systems.

HPC Cluster Performance Indicators

FLOPS (floating‑point operations per second) is used to evaluate a computer’s computational capability; theoretical performance can be calculated from hardware specifications.

CPU theoretical performance (example: Intel CPU): Single‑precision: frequency × (vector width/32) × 2 Double‑precision: frequency × (vector width/64) × 2 "2" denotes a multiply‑add instruction.

GPU theoretical performance (example: NVIDIA GPU): Single‑precision: instruction throughput × number of compute units × frequency.

MIC theoretical performance (example: Intel MIC): same formula as CPU.

Benchmark programs such as Linpack (e.g., Linapack) and High‑Performance Linpack (HPL) are used to evaluate overall system capability; HPL is the standard benchmark for the TOP500 and TOP100 lists.

Homogeneous and Heterogeneous Compute Nodes

Homogeneous nodes contain only CPUs (single‑, dual‑, quad‑, or octa‑socket configurations). Heterogeneous nodes combine CPUs with accelerators such as GPUs, MICs, or FPGAs to improve performance and energy efficiency.

Common Parallel File Systems

PVFS : Parallel Virtual File System from Clemson University, an open‑source solution for Linux clusters.

Lustre : A widely used parallel distributed file system for large clusters and supercomputers, released under GPLv2.

Cluster Management System Main Functions

Monitoring: tracks node, network, storage, and power status.

User management: create, edit, and delete users and groups.

Network management.

File management: upload, create, copy, rename, delete, download.

Power management.

Job submission and management: submit, monitor, execute, and delete jobs, view logs.

Graphical user interface for easier operation.

Cluster Job Scheduling Systems

The four major families are PBS (including OpenPBS, PBS Pro, Torque), Slurm, LSF (Spectrum LSF, Platform LSF, OpenLava), and SGE. PBS, originally developed at NASA Ames, supports batch, interactive, MPI, PVM, HPF, and MPL jobs and is highly portable (POSIX‑compatible).

Slurm (Simple Linux Utility for Resource Management) is an open‑source, Linux‑only scheduler with high fault tolerance, support for heterogeneous resources, and extensive plugin architecture.

LSF (Load Sharing Facility) has commercial variants (Spectrum LSF, Platform LSF) and an open‑source fork OpenLava; it supports up to 12 000 000 nodes and can auto‑scale to cloud providers (AWS, Azure, Google Cloud).

Parallel Programming Models

Message Passing Interface (MPI) is the dominant model for distributed memory parallelism. MPI programs consist of multiple processes with independent address spaces that communicate via explicit message‑passing calls.

Key MPI implementations:

OpenMPI : Open‑source MPI‑2 implementation, widely used across platforms.

Intel MPI : MPI library integrated with Intel compilers.

MPICH : Reference implementation developed by Argonne and Mississippi State, highly portable.

MVAPICH / MVAPICH2 : Optimized for InfiniBand (VAPI) interconnects; latest version 2.2b.

Download links (examples): OpenMPI: http://www.open-mpi.org/ MPICH: http://www.mpich.org/ MVAPICH2: http://mvapich.cse.ohio-state.edu/

The article aggregates information from various sources such as “High‑Performance Computing Knowledge Summary”, “OpenMP Compilation Principles and Implementation”, and several industry reports.

High Performance ComputingParallel ComputingClusterMPIHPCJob Scheduling
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.