Skip to main content

MareNostrum 5 NG-GRACE

System overview

MareNostrum 5 NG-GRACE is a high-performance computing (HPC) cluster that consists of 408 nodes, each powered by NVIDIA's Grace CPU Superchip. This high-efficiency, ARM-based processor is specifically designed to meet the demanding memory and processing needs of HPC and AI tasks. The system, developed by GIGABYTE, is housed in H263-V60-AAW1 servers, which have a 2U form factor. Each server chassis contains four air-cooled, half-width nodes that are optimized for scientific and data-intensive applications.

This setup provides a total of 58,752 processor cores and 33.8TB of main memory, as detailed in the table below:

Node typeNode countCores per nodeMain memory per node
Grace - General Purpose (GP)408144240GB

MareNostrum 5 NG-GRACE nodes are primarily equipped with the following specifications:

  • 2x NVIDIA Grace 72C 3.1GHz
  • 2x Die 120GB 4266MHz LPDDR5
  • 800GB NVMe local storage
  • NVIDIA/Mellanox ConnectX-7 NDR200 InfiniBand (200Gb/s total bandwidth per node, OSFP, PCIe 5.0 x16)

Topology of a compute node

This is a high-level analysis of the lstopo command output from a compute node with two NVIDIA Grace processors.

MareNostrum 5 Grace-GPP nodes topology

Node-Level Overview
  • Memory Configuration: The system has a total of 240GB, allocated across two NUMA nodes.
  • Processor Packages: The node comprises two sockets (Socket L#0 and Socket L#1). Each socket is associated with its own NUMA node and L3 cache.
NUMA Nodes and Memory Distribution
  • NUMA Node L#0: 120GB of memory.
  • NUMA Node L#1: 120GB of memory.
Cache Hierarchy

Each socket has a similar cache architecture:

  • L3 Cache: 114MB per socket.
  • L2 Cache: 1MB (1024KB) per core.
  • L1 Cache: Each core has two types of L1 cache:
    • L1d Cache (Data): 64KB
    • L1i Cache (Instructions): 64KB
Cores and Processing Units (PUs)

Each socket contains 72 cores, totaling 144 cores across the two sockets. The cores are structured as follows:

  • Each core has a dedicated L1 and L2 cache.
  • Each core is represented by a Processing Unit (PU), designated by PU L#.
Summary

This setup indicates a high-performance system with ample L3 cache, large memory, and dual NVIDIA Grace processors, which collectively offer robust parallel processing capabilities for workloads requiring high memory bandwidth and multi-threading.