Fundamentals 18 min read

HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing

The article presents HadaFS, a novel burst‑buffer‑based distributed file system that combines the scalability of local burst buffers with the data‑sharing advantages of shared buffers, details its LTA architecture, metadata handling, the Hadash management tool, and extensive performance evaluations on the SNS supercomputer.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
HadaFS: A New Burst Buffer File System for Scalable High‑Performance Computing

The paper introduces HadaFS, a new burst‑buffer (BB) file system designed for extreme‑scale high‑performance computing (HPC) environments. HadaFS stacks on top of global file systems and provides a distributed storage layer that offers both the performance of local BBs and the data‑sharing benefits of shared BBs.

Background: Modern HPC workloads generate massive I/O demands, prompting the adoption of BB technologies that use SSDs as an intermediate acceleration layer between compute nodes and back‑end storage. Two BB deployment models exist: local BB (SSD per compute node) and shared BB (SSD on dedicated I/O forwarding nodes). Each has trade‑offs in scalability, cost, and data‑sharing capability.

Motivation: Existing BB solutions struggle with scalability to hundreds of thousands of concurrent I/O streams, flexible consistency semantics, and dynamic data migration. The authors aim to unify the advantages of local and shared BBs while reducing deployment cost.

Design and Implementation (HadaFS): HadaFS consists of a client library, a set of HadaFS servers, and a data‑management tool called Hadash. Clients intercept POSIX I/O calls and forward them to a bridge server; servers store file data on NVMe SSDs and metadata in RocksDB. Two metadata databases are maintained per server: a local metadata DB (LMDB) for node‑local changes and a global metadata DB (GMDB) for cluster‑wide state.

Localized Triage Architecture (LTA): Each client connects to a single bridge server, which forwards requests to other servers when needed, forming a full‑mesh of bridge servers. This provides local‑BB‑like performance while enabling shared‑BB data access across the cluster.

Metadata and Namespace: HadaFS abandons traditional directory trees in favor of a full‑path hash that serves as a globally unique key. Metadata is stored as key‑value pairs, allowing fast prefix‑based lookups and virtual directory views.

Hadash Tool: Hadash offers familiar POSIX‑style commands (ls, du, find, grep) for metadata queries and uses a Redis pipeline to issue data‑migration commands to HadaFS servers, enabling efficient staging between HadaFS and the underlying global file system (GFS).

Optimizations: Three metadata synchronization modes are provided (asynchronous, mixed, and near‑synchronous) to balance consistency and performance. HadaFS avoids kernel‑level I/O staging, reducing overhead, and supports dynamic client‑to‑server mapping to mitigate I/O interference.

Performance Evaluation: Experiments on the Sunway Next‑Generation Supercomputer (SNS) with over 100 000 compute nodes show that HadaFS outperforms BeeGFS and traditional Lustre‑based GFS in metadata operations (MDTest), data I/O bandwidth (IOR), and large‑scale data migration (compared with Datawarp). HadaFS achieves up to 3.1 TB/s aggregate bandwidth and supports up to 600 000 concurrent clients.

Conclusion: HadaFS demonstrates that a shared‑BB architecture with LTA can deliver local‑BB‑level scalability and performance while maintaining low deployment cost and flexible consistency, making it suitable for future exascale HPC applications.

High Performance Computingfile systemperformance evaluationMetadata ManagementBurst BufferHPC Storage
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.