EXTRACT: A distributed data-mining software platform for extreme data across the compute continuum

Status: Not started yet Start:
01/01/2023
End:
31/12/2025

Primary tabs

Description

Data has become one of the most valuable assets, driving the digital transformation across many sectors. Current data mining solutions are optimized to deal with specific data requirements, but fail to cope as the data characteristics become extreme. There is therefore an urgent need for novel and holistic approaches to enable the development, deployment and efficient execution of datamining workflows across a heterogeneous, secure and energy-efficient compute continuum, while fulfilling the diverse extreme datacharacteristics.

To fill this technological gap, EXTRACT will deliver a data-driven open-source software platform integrating the mostrelevant technologies, to facilitate the development of trustworthy, accurate, fair and green data mining workflows able to generatehigh-quality actionable knowledge. The EXTRACT platform will improve the complete lifecycle of extreme data mining workflows, significantly enhancing performance, energy-efficiency, scalability and security, while fulfilling the extreme data characteristics in aholistic way. Moreover, multiple computing technologies, from edge to cloud to HPC, will be integrated into a unified and securecompute continuum. Specifically, the platform will feature enhanced data infrastructures and AI & big-data frameworks, novel data-driven orchestration and distributed monitoring mechanisms, a unified continuum abstraction and cybersecurity and digital privacyacross all software layers.

The EXTRACT platform will be validated in two real-world use-cases with different extreme data requirements:

  1. Personalized Evacuation Route service, integrating data from the European data sources, Copernicus and Galileo, with 5G localization signals and smart city IoT sensors for civilian-centric crisis management; and
  2. Transient Astrophysics with a SKApathfinder, processing extreme data from 2000 radio-telescopes for the real-time assessment of solar activity, generating knowledge for further scientific exploitation. 

Funding