Performance Evaluation of AMD Milan‑X 7773X CPU for HPC Workloads (WRF and OpenFOAM) on NF5468A5 Server
The article presents a detailed benchmark of AMD's Milan‑X 7773X processor, highlighting its 3D V‑Cache architecture and large 768 MB L3 cache, and demonstrates up to 1.80× speed‑up for HPC applications WRF and OpenFOAM on a dual‑socket NF5468A5 GPU server.
The AMD Milan‑X 7773X, released in 2023, is an upgraded third‑generation EPYC processor featuring 3D V‑Cache technology that triples the on‑die L3 cache to 768 MB, uses a multi‑chip module (MCM) with eight CCDs and a large I/O die, and remains compatible with LGA 4094 motherboards supporting up to 4 TB per socket.
To assess its impact on high‑performance computing (HPC) workloads, the authors selected two representative applications: the Weather Research and Forecasting model (WRF) and the computational fluid dynamics solver OpenFOAM. Tests were conducted on a浪潮 NF5468A5 4U GPU server equipped with two AMD CPUs (Rome 7742, Milan 7543, Milan‑X 7773X), eight NVIDIA A100/A30/A40 GPUs, 32 × 64 GB DDR4‑3200 memory (up to 8 TB total), and a PCI‑e 4.0 interconnect.
Platform specifications
Software environment
Software version
Operating System
RedHat Enterprise Linux 8.3.2011 x86_64
Compiler
Intel Compiler 2021.2.0
Parallel environment
Intel MPI 2021.2.0
Application software
WRF‑v3.9.1, OpenFOAM‑v1906
WRF benchmark
The WRF test used a two‑level nested grid (12 km and 4 km resolution) with 425 × 300 × 35 and 1150 × 802 × 35 points respectively, running for a 3‑hour simulated period. Compared with the Rome 7742 baseline, Milan 7543 achieved a 1.14‑1.27× speed‑up, while Milan‑X 7773X delivered a 1.23‑1.34× improvement. L3 cache miss rates dropped from 50‑70 % on Milan 7543 to 25‑55 % on Milan‑X.
OpenFOAM benchmark
The OpenFOAM motorBike case (simpleFoam solver, SST‑k‑ω turbulence model) generated a mesh of ~10.3 million cells. Using the same core counts, Milan 7543 provided a 1.23‑1.28× speed‑up over Rome, while Milan‑X achieved 1.34‑1.80× acceleration. L3 cache miss rates fell from ~40 % on Milan to 20‑30 % on Milan‑X.
Conclusion
The experiments confirm that the ultra‑large L3 cache of AMD Milan‑X CPUs significantly alleviates memory‑bound bottlenecks in HPC workloads, delivering up to 34 % faster WRF runs and up to 80 % faster OpenFOAM simulations compared with the previous generation Rome platform.
Future work will explore multi‑node scaling of Milan‑X in larger clusters.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.