BSC software makes Lenovo's new set of cooling technologies more energy efficient

31 July 2018

The Energy Aware Runtime software has been developed in the framework of the BSC-Lenovo collaboration project

BSC has contributed towards the creation of a new set of Lenovo cooling technologies that can allow data centres to run up to 50% more efficiently, without hindering performance or density. The BSC has participated in this new release with its Energy Aware Runtime (EAR) software, as part of the BSC-Lenovo collaboration project. Named Neptune, the Lenovo technology was announced at the ISC18 international high-performance conference in June.

EAR is the software partnership for the Neptune energy efficient architecture. It is the result of putting together the Lenovo experience in energy management and the BSC experience in runtime systems, monitoring tools, and dynamic application characterization. This collaboration has made also possible that EAR efficiently uses the new HW created by Lenovo available in the new SD650 watter cooling systems for high frequency energy readings. Since LRZ will incorporate EAR as its Energy management framework, they have started to actively collaborate with us to improve, for instance, EAR security or EAR new components for scalability.
“EAR guarantees applications will run at the most energy efficient frequency according to its dynamic characteristics, detected by the runtime library, and energy system configuration”, states Julita Corbalán, the BSC researcher who has developed this software. 

For his part, Miguel Terol - HPC Project Manager at Lenovo, states "Lenovo is proud to work with BSC in the development of EAR. As the Exascale era is arriving, the efficient management of power consumption is key in the adoption of upcoming technologies. The brilliant design of EAR will help push the limits of large scale supercomputing deployments."

Features of EAR software

EAR software is an energy management framework including (among other components) the EAR library and EAR Global Manager. EAR library is a dynamic, transparent, and lightweight runtime library that controls the energy consumed by mpi jobs without any application modification or user input. EAR library guarantees the efficient utilization of system energy. It can be configure to boost energy efficient applications or to save energy by reducing the frequency up to a maximum performance degradation (controlled by EAR). EAR dynamically identifies repetitive regions in parallel applications (outer loops) without adding any annotation or user input. The algorithm in charge of detecting these regions is called DynAIS. DynAIS is an innovative multi-level algorithm with very low overhead. EAR internals are DynAIS driven, being able to evaluate EAR decisions, one of the key differences between EAR and other solutions. Thanks to DynAIS, EAR dynamically computes the Application Signature, a very reduced set of metrics that characterize application behaviour. The Application Signature together with the HW characterization (we call it System Signature) are the inputs for the power and performance models used by EAR. EAR proposes a totally distributed frequency selection design avoiding interferences and additional noise in the network or the file system. Apart from EAR library, EAR framework includes the EAR Global Manager (EARGM). This component controls the energy consumed in the system following system configuration. It can be configured to work as a system monitoring tool, reporting warning messages, or it can be configured to be pro-active and automatically adapt system settings being coordinated with EAR library. Since EAR library is aware of application characteristics, it can react to the different EARGM warnings levels based on application characteristics and the energy efficiency measured. The combination of EARGM + EAR library makes EAR a Cluster solution for energy management. 

Even though EAR library can be only loaded with MPI jobs, the rest of EAR components (not mentioned for simplicity), are valid for any type of application. Being a real global solution suitable for production systems.