New release of the DLB library that provides an interface to manage the resources assigned to a running process

08 January 2018
The most relevant feature is the novel DROM module, which provides an interface to manage the computational resources assigned to processes during their execution by an external entity.
 

The Computer Sciences department at BSC is proud to announce the release of DLB 2.0 (Dynamic Load Balance). The most relevant feature of this version is the novel DROM (Dynamic Resource Ownership Management) module. This module provides an interface to manage the computational resources assigned to processes during their execution by an external entity (i.e. job scheduler or resource manager).

Additionally, the module in charge of improving the load balance between processes has been extended. LeWI can communicate now in an asynchronous mode with the parallel runtimes through an extension in its API.

Marta Garcia, lead scientist of the DLB tool, says: “Fixing load imbalance on applications is not only important to improve a single application’s performance, but it is also key to boost the utilization of supercomputing systems”.

DLB has a tremendous potential to address for free the imbalance issues in hybrid applications that would otherwise require significant refactoring efforts, it also reliefs programmers from having to worry about what is the right configuration of processes and threads in hybrid codes” says Jesús Labarta, Computer Science department director.

DLB is today helping to improve balance in different European projects, such as the Human Brain Project, HPC Europa 3, MontBlanc 3, POP or Interwine. Besides, it is used by wide range of applications of different domains, like for instance neuroscience, computational mechanics, molecular dynamics, cosmological simulations or climate modeling.

DLB is our preferred tool to mitigate imbalances occurring on Alya executions. These imbalances appear spontaneously or come from inaccurate load distributions. DLB solves both problems at runtime, acting only when necessary, making our code much more resilient for modern HPC systems. We save millions of CPU hours every year by using DLB” says Ricard Borrell, senior researcher from the Alya development team.

Now, the DLB library is organized in two different modules, LeWI and DROM, that are independent between them but can work coordinated.

More specifically, in this new version, apart from several bug fixes, new features have been introduced:

1. DROM (Dynamic Resource Ownership Management) module.

- DROM offers an API for external entities (i.e. Job Scheduler, Resource Manager…), it allows to remove CPUs from a running process to assign them to a new process or an existing one.

2. Asynchronous version of LeWI (Lend When Idle) load balancing algorithm.

- The load balancing algorithm LeWI can work in a synchronous and asynchronous mode. The new asynchronous mode provides an interaction between the runtime and DLB without polling.

3. New DLB public API

- More clear, with the unification of names

- More exhaustive, supporting more use cases

4. Callback system for parallel runtimes

- The callback system allows registering functions as callbacks for DLB actions, providing a friendly interface for integrating new parallel runtimes with DLB.

5. Support for interoperability of multiple runtimes

- DLB provides support for several parallel runtimes within the same process sharing computational resources.

6. New mechanism to set DLB options based in DLB_ARGS environment variable

- Now, all the options passed to DLB are contained in an environment variable, facilitating the configuration of DLB and the detection of errors when setting of options.

You can freely download DLB (distributed as open source under LGPL-3.0 license) and get more information at DLB’s website: https://pm.bsc.es/dlb