BSC releases PyCOMPSs/COMPSs version 3.0

20 June 2022

The Barcelona Supercomputing Center offers COMPSs to the HPC community, a set of tools that helps developers efficiently program and execute their applications on distributed computational infrastructures.

This COMPSs release includes several new features such as transparent checkpointing support, automatic creation of data provenance information, and support for MPMD MPI applications.

The Python binding comes with Mypy compilation and a new CLI to unify executions of application in different environments.

The Workflows and Distributed Computing team at the Barcelona Supercomputing Center is proud to announce a new release, version 3.0 (codename Lavender), of the programming environment COMPSs.

This version of COMPSs updates the result of the team’s work in the last years on the provision of a set of tools that helps developers to program and execute their applications efficiently on distributed computational infrastructures such as clusters, clouds and container managed platforms. COMPSs is a task-based programming model known for notably improving the performance of large-scale applications by automatically parallelizing their execution.

COMPSs has been available for the last years for the MareNostrum supercomputer and Spanish Supercomputing Network users, and it has been adopted in several research projects such as EUBra-BIGSEA, MUG, EGI, ASCETIC, TANGO, NEXTGenIO, I-BiDaaS, mF2C, CLASS, ExaQUte, ELASTIC the BioExcel CoE, LANDSUPPORT, the EXPERTISE ETN and in the Edge Twins HPC FET Innovation Launchpad project. In these projects, COMPSs has been applied to implement use cases provided by different communities across diverse disciplines as biomedicine, engineering, biodiversity, chemistry, astrophysics, financial, telecommunications, manufacturing and earth sciences. Currently, it is also under extension and adoption in applications in the projects AI-SPRINT, PerMedCoE and CAELESTIS. It has also been applied in sample use cases of the ChEESE CoE. A special mention is the eFlows4HPC project coordinated by the group, started in January 2021, that aims to develop a workflow software stack where one of the main components is the PyCOMPSs/COMPSs environment.

The new release includes new features that extend the fault tolerance of applications and its reproducibility features.

A transparent task-based checkpointing support has been added that enables to recover failed executions. The proposed system leverages the determinism of tasks to avoid the re-execution of non-failed tasks in case of breakdown, by automatically copying their output as the execution goes on. Performing such copies entails a significant overhead on network and storage operations; the optimal balance for this trade-off between resilience and performance depends on each execution and the preferences of the end-user. To that end, the proposed system is the result of the combination of different mechanisms that systematically select which output values to checkpoint and envisages the customization of these decisions by incorporating mechanisms to define new policies. Besides systematic copies, the system also provides application developers with a method to set up specific points in the application code to checkpoint the execution status.

COMPSs has also been extended with a mechanism that automatically records Data Provenance from the execution of COMPSs applications. This enables reproducibility and replicability of COMPSs applications and dislib algorithms. The approach is based on the RO-Crate 1.1 Specification that offers good integration to existing tools and frameworks. The COMPSs runtime has been modified to generate RO-Crates. A logger registers unique accesses to files, to automatically identify inputs and outputs of the workflow. A post-process extracts the information needed from the logger to generate the RO-Crate.

Other developments have been performed in the framework of the eFlows4HPC project, to enable the convergence between HPC, AI and data analytics. A new support for MPMD MPI applications as tasks is provided, as well as support for tasks' epilog and prolog, and a generic support for reusable descriptions of external software executions inside a COMPSs task. In order to unify the execution of application in different computing environment, a new Command Line Interface (CLI) has been designed and implemented.

Other enhancements are Mypy compilation of python binding to speed up the Python executions, the integration with DLB DROM to improve affinity in OpenMP tasks and RISC-V 64bit support.

COMPSs 3.0 comes with other minor new features, extensions and bug fixes.

COMPSs had around 1000 downloads last year and is used by around 20 groups in real applications. COMPSs has recently attracted interest from areas such as engineering, image recognition, genomics and seismology, where specific courses and dissemination actions have been performed.

The packages and the complete list of features are available in the Downloads page. A Docker image is also available to test the functionalities of COMPSs through a step-by-step tutorial that guides the user to develop and execute a set of example applications.

Additionally, a user guide and papers published in relevant conferences and journals are available.

The Workflow and Distributed Computing team at the Barcelona Supercomputing Center aims to offer tools and mechanisms that enable the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources in a transparent way. The research done in this team is based in the former expertise of the group, and extending it towards the aspects of distributed computing that can benefit from this expertise. The team at BSC has a strong focus on programming models and resource management and scheduling in distributed computing environments.