MareNostrum 5 NG-GRACE
System overview
MareNostrum 5 NG-GRACE is a high-performance computing (HPC) cluster that consists of 408 nodes, each powered by NVIDIA's Grace CPU Superchip. This high-efficiency, ARM-based processor is specifically designed to meet the demanding memory and processing needs of HPC and AI tasks. The system, developed by GIGABYTE, is housed in H263-V60-AAW1 servers, which have a 2U form factor. Each server chassis contains four air-cooled, half-width nodes that are optimized for scientific and data-intensive applications.
This setup provides a total of 58,752 processor cores and 33.8TB of main memory, as detailed in the table below:
Node type | Node count | Cores per node | Main memory per node |
---|---|---|---|
Grace - General Purpose (GP) | 408 | 144 | 240GB |
MareNostrum 5 NG-GRACE nodes are primarily equipped with the following specifications:
- 2x NVIDIA Grace 72C 3.1GHz
- 2x Die 120GB 4266MHz LPDDR5
- 800GB NVMe local storage
- NVIDIA/Mellanox ConnectX-7 NDR200 InfiniBand (200Gb/s total bandwidth per node, OSFP, PCIe 5.0 x16)
Topology of a compute node
This is a high-level analysis of the lstopo
command output from a compute node with two
NVIDIA Grace processors.
Node-Level Overview
- Memory Configuration: The system has a total of 240GB, allocated across two NUMA nodes.
- Processor Packages: The node comprises two sockets (Socket L#0 and Socket L#1). Each socket is associated with its own NUMA node and L3 cache.
NUMA Nodes and Memory Distribution
- NUMA Node L#0: 120GB of memory.
- NUMA Node L#1: 120GB of memory.
Cache Hierarchy
Each socket has a similar cache architecture:
- L3 Cache: 114MB per socket.
- L2 Cache: 1MB (1024KB) per core.
- L1 Cache: Each core has two types of L1 cache:
- L1d Cache (Data): 64KB
- L1i Cache (Instructions): 64KB
Cores and Processing Units (PUs)
Each socket contains 72 cores, totaling 144 cores across the two sockets. The cores are structured as follows:
- Each core has a dedicated L1 and L2 cache.
- Each core is represented by a Processing Unit (PU), designated by PU L#.
Summary
This setup indicates a high-performance system with ample L3 cache, large memory, and dual NVIDIA Grace processors, which collectively offer robust parallel processing capabilities for workloads requiring high memory bandwidth and multi-threading.