Lustre Performance Optimization Guide
This article provides a comprehensive guide to optimizing Lustre, the leading open‑source parallel file system for high‑performance computing, covering network bandwidth, stripe settings, client configuration, RAID choices, small‑file handling, and practical system commands to improve aggregate I/O performance.
Lustre has become synonymous with high‑performance computing (HPC) as the most widely adopted open‑source parallel file system, backed by vendors such as Intel and DDN, with DDN now handling Intel’s Lustre business.
Network bandwidth is a primary factor for aggregate Lustre throughput. The system reads data from multiple OSS nodes simultaneously, so the underlying network must be fast enough. Consider network type (TCP/IP vs. InfiniBand), NIC speed (10 GbE, 40 GbE), number of NICs, and bonding mode. InfiniBand typically offers higher performance at greater cost, while 40 GbE outperforms 10 GbE.
Lustre internal settings focus on stripe size (minimum 64 KB), stripe count, and start‑OST. Proper striping enables I/O concurrency. By default, start‑OST is –1 (no fixed start), which balances load. Increasing stripe size beyond 64 KB can reduce aggregate bandwidth, while increasing stripe count generally improves it up to a point.
Client configuration influences performance through the number of client processes (connections), I/O block size, and the total number of clients. More connections raise aggregate bandwidth until saturation, after which performance plateaus or declines. Larger I/O block sizes improve bandwidth up to a stable range (≈64 KB–64 MB). Adding clients boosts read‑mode bandwidth noticeably, with less effect on write‑mode.
Storage RAID choices affect both capacity and protection. Hardware RAID outperforms software RAID but costs more; RAID 6 is common for data protection. OSTs usually use low‑cost SATA disks, while MDS nodes often employ SSDs for metadata speed.
Small‑file optimization includes aggregating small files (e.g., tar, loopback mounts), using O_DIRECT with 4 KB I/O size, ensuring sequential writes, deploying SSD‑backed or RAID 1+0 OSTs, and reducing metadata overhead. System‑level tweaks are also recommended:
Disable LNET debug: sysctl -w lnet.debug=0
Increase client dirty cache (default 32 MB) to improve I/O buffering.
Raise RPC parallelism (default 8, up to 32) to boost data and metadata throughput.
Set Lustre striping to a single OST for small files: lfs setstripe -c 0/1/-1 /path/filename
Use local file locking: mount -t lustre -o localflock
Employ loopback mounts to convert metadata operations into OSS reads/writes.
These methods work well for scratch spaces but should be applied cautiously to production data.
References
Lustre File System Performance Optimization Research 2011 (Wang Bo, Li Xianguo, Zhang Xiao)
Analysis of Lustre Performance Factors Based on Software RAID 2008 (Zhang Dandan, Yao Jifeng)
Lustre I/O Performance Best Practices
Analysis and Improvement of Lustre I/O Performance (Lin Songtao, Zhou Enqiang, Liao Xiangke)
For more detailed HPC technology insights, refer to the original e‑book “High‑Performance Computing (HPC) Technology, Solutions, and Industry Overview”.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.