Operations 15 min read

BeeGFS Parallel File System: Architecture, Components, Installation, and Tuning Guide

BeeGFS is a GPL‑licensed parallel file system for Linux that offers scalable storage through a modular architecture of management, metadata, and object storage servers, supports a wide range of hardware and OS platforms, and provides detailed installation, configuration, and performance‑tuning guidance including the BeeOND burst‑buffer extension.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
BeeGFS Parallel File System: Architecture, Components, Installation, and Tuning Guide

ThinkParQ, founded in late 2013 by key contributors of BeeGFS, provides professional support, services, and consulting for BeeGFS customers. Originally named FhGFS and developed by the Fraunhofer Institute for industrial mathematics, it was renamed BeeGFS in 2014 and has seen broad adoption in research and commercial HPC environments.

BeeGFS is both a network and parallel file system. Clients communicate with storage servers over TCP/IP or any RDMA‑capable interconnect (InfiniBand, RoCE, Omni‑Path) using the native verbs interface. Adding more servers aggregates capacity and performance under a single namespace.

BeeGFS is a GPL‑licensed, free‑open‑source product with no licensing fees; it can be downloaded and installed from www.beegfs.com, and ThinkParQ offers professional support and integration services.

BeeGFS Operating System Compatibility

BeeGFS runs on a wide range of hardware platforms (x86, x86_64, ARM, OpenPower) and Linux distributions, including RHEL, Scientific Linux, CentOS, SUSE Linux Enterprise Server/Desktop, OpenSUSE, Debian, and Ubuntu.

BeeGFS System Architecture

The system separates ObjectData (user data) from MetaData (metadata such as permissions, size, location). MetaData servers locate the appropriate storage servers, allowing clients to retrieve MetaData first and then directly communicate with ObjectData servers.

BeeGFS serves any workload requiring large or fast storage, from high‑throughput computing to massive research datasets. The number of Object Storage Servers (OSS) and Metadata Servers (MDS) can be elastically scaled to meet performance demands.

All components run on Linux; a BeeGFS deployment requires at least a Management Server (MS), one or more Object Storage Servers (OSS), Metadata Servers (MDS), and client nodes.

Helper‑daemon : a client‑side daemon required for the client to operate.

Admon : an optional administration and monitoring daemon that provides system insights but is not required for basic operation.

BeeGFS is designed to work alongside POSIX‑compatible local file systems (ext4, XFS, ZFS), allowing administrators to choose familiar storage back‑ends.

Management Server (MS)

The single MS stores configuration for all components (clients, metadata servers, storage targets). It tags storage and metadata targets as normal, low, or critical, influencing target selection based on available space.

Metadata Server (MDS)

MDS holds all metadata. Each MDS has a Metadata Target (MDT), typically on SSDs with RAID1/10 for best performance; RAID5/6 is discouraged due to random‑IO penalties. Directories are assigned to MDS instances, allowing the system to scale by distributing metadata across many MDS nodes.

Root‑level directories reside on MDS#1, providing a single entry point, while lower‑level directories link to their responsible MDS.

Thread count for services should be chosen carefully: too many threads waste CPU and memory, while too few limit performance.

Object Storage Server (OSS)

OSS stores the actual file data. Each OSS may host multiple Object Storage Targets (OST), which can be local file systems (XFS, ext4, ZFS) or LUNs. A typical OSS uses 6‑12 disks in RAID6; a 36‑disk OSS might consist of three OSTs with 12 disks each.

OSS runs as a multithreaded user‑space daemon, compatible with any POSIX‑compliant local file system. Thread count depends on the number and performance of OSTs; OSS I/O is usually large sequential transfers.

BeeGFS supports striping: each directory can define numtargets (number of OSTs a file spans) and chunksize (data amount written to one OST before moving to the next). This improves single‑file performance and capacity, e.g., four 30 TB OSTs can deliver a 120 TB file at 2 GB/s.

File System Client

The Linux client is a kernel module that must be compiled for the running kernel. It provides a standard mount point and includes two daemons:

beegfs‑helperd : offers auxiliary functions (DNS, write‑log handling) for the client.

beegfs‑client : loads the kernel module and recompiles it automatically when the kernel changes.

All services can run on separate hosts or be co‑located on a single machine for small deployments, a configuration sometimes called a “fusion device.”

Installation and Setup

BeeGFS can be installed via a GUI (Java‑based) or manually using shell commands. The GUI is recommended for beginners, while experienced users prefer manual installation for full flexibility.

BeeGFS Tuning and Configuration

Tuning covers storage server formatting, metadata server settings, client parameters, striping, network (InfiniBand/Ethernet) optimization, and cache configuration. Detailed guidance is available in the official BeeGFS configuration manual.

The beegfs-ctl tool reads the default configuration file ( beegfs-client.conf ) and can operate without it if a basic client config is present.

BeeOND Burst Buffer

BeeOND (BeeGFS On‑Demand) creates one or more temporary BeeGFS instances for cloud‑oriented workloads, aggregating local SSDs or HDDs on compute nodes to provide extra performance and burst‑buffer capability.

Most HPC clusters use a global parallel file system, but compute nodes often have local storage; BeeOND leverages this to reduce I/O spikes on the shared file system, enable isolated workloads, and accelerate HPC jobs with SSD‑based burst buffers. It integrates easily with workload managers like Torque or Slurm and incurs no additional hardware cost.

BeeOND is packaged as a standard software component, installable via the distribution’s package manager, and automatically pulls in required BeeGFS server and client dependencies.

Simple Summary

BeeGFS’s lightweight architecture, precise handling of modern HPC burst‑buffer needs, open‑source model, and strong ecosystem position it as a compelling alternative to Lustre, with rich enterprise features and a growing partner network.

Performance TuningLinuxstorageInstallationHPCparallel file systemBeeGFS
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.