Canada joins Federated EGA, marking first major expansion beyond Europe

11 March 2025

Federated EGA data is hosted at BSC facilities, showing the crucial role of supercomputing centres in large-scale data sharing and analysis for the benefit of global biomedical research

New partnership consolidates Federated EGA as the most significant global repository for secure sharing of human genomic and health data

In a major milestone for international biomedical research and the future of personalised medicine, the Canadian Genome-Phenome Archive (CGA) has joined the Federated European Genome-phenome Archive (Federated EGA), marking the federation’s first significant expansion outside of Europe.

Federated EGA is jointly managed by the Centre for Genomic Regulation (CRG) in Barcelona and EMBL’s European Bioinformatics Institute (EMBL-EBI) in the UK. It was built following the data governance model of the European Genome-phenome Archive, which is funded by the “la Caixa” Foundation.

The announcement times with the publication of a marker paper (3 March 2025) in Nature Genetics which lays out the practical and regulatory challenges faced by federated data-sharing initiatives, and puts forward Federated EGA’s vision to enable global discovery and access to sensitive human ‘omics’ data.

“This is a landmark moment for the global scientific community,” says Prof. Arcadi Navarro, ICREA Research Professor at the Universitat Pompeu Fabra and Director of the EGA team at the Centre for Genomic Regulation.

“Global health challenges like pandemics and rare diseases do not respect national borders. By expanding beyond Europe and adding the Canadian node, we consolidate the Federated EGA as the world’s most comprehensive, secure, and diverse resource for genomic and health data to tackle these urgent questions. This will have a transformative impact on scientific discovery and personalised medicine initiatives around the world,” adds Dr. Navarro.

Federated EGA’s vision laid bare in Nature Genetics

Researchers worldwide rely on access to robust, diverse datasets to uncover insights into the mechanisms of human health and disease. By securely combining data across populations, countries, and institutions, scientists can improve the reliability and precision of their research findings.

Connecting different data repositories helps scientists access larger datasets from individuals with different backgrounds and yield crucial insights into how diseases manifest in varied populations. It also helps spare duplication of efforts, accelerating breakthroughs in diagnosis, treatment, and prevention strategies.

However, each country adheres to rigorous privacy and security protocols which can lead to a patchwork of different regulations that affect how health and genomic data is shared across borders. Initiatives like Federated EGA address these hurdles by maintaining data locally in each country, which allows researchers worldwide to discover, request, and in some cases analyse data in a secure environment.

The paper in Nature Genetics explains the practical and regulatory challenges faced by Federated EGA and its ability to foster global collaboration while accommodating diverse legal and ethical frameworks. The authors explain the different ways they have joined data nodes in Finland, Germany, Norway, Spain, Sweden, Poland and Portugal since federated EGA was first created in 2022.

Canada is the first country outside of Europe to join Federated EGA

The paper in Nature Genetics explicitly states that the Federated EGA’s governance model is not limited to European countries and that the network intends to expand globally. Welcoming Canada into Federated EGA is a real-world example of the initiative’s global expansion in action.

The CGA becomes the latest national resource to connect its datasets to a global community. Their efforts will increase the volume and diversity of genomic information available and pave the way for entirely new studies that would otherwise be impossible.

Genomic and health data from Canadian biomedical research projects will be permanently archived and distributed through the CGA, a national service that adheres to rigorous privacy and security protocols. The CGA Node, part of the Pan-Canadian Genome Library, is overseen by Canada’s Michael Smith Centre for Genome Sciences at BC Cancer and has been established in collaboration with the Digital Research Alliance of Canada and CGEn, Canada’s national facility for genome sequencing and analysis. The initiative is supported by the Canadian Institutes of Health Research (CIHR) and the Canada Foundation for Innovation (CFI).

“This milestone partnership provides Canadian researchers with a swift, secure connection to global collaborators, fostering scientific and clinical innovation while continuing to uphold our commitment to privacy and compliance,” says Dr. Steven Jones, lead for the CGA node and Co-Director of Canada’s Michael Smith Centre for Genome Sciences at BC Cancer.

Data governance model inspired by Central EGA

Federated EGA was built following the data governance model of the European Genome-phenome Archive (EGA), which is hosted by the MareNostrum5 supercomputer located at the Barcelona Supercomputing Center – Centro Nacional de Supercomputación (BSC-CNS) (BSC). The information is also stored at EMBL-EBI’s headquarters in Hinxton, UK.

The EGA played a critical during the COVID-19 pandemic, hosting and managing data for several large-scale studies. For instance, projects like the COVID-19 Host Genetics Initiative used it to securely share genomic and clinical data among international research groups, helping identify genetic factors linked to infection severity and outcomes. It has also helped reveal new causal variants in childhood cancers and discover genetic variants which increase the risk of ulcerative colitis.

As of February 2025, the EGA contains 16PB of human health and genomic data, around three quarters the size of the entire US Library of Congress’ digital collections (21PB). For reference, the file size of a picture taken by an average mobile phone nowadays typically ranges from 1.5 to 5 MB. Using the upper estimate, the data stored at the central EGA is equivalent to more than 3.2 billion mobile phone pictures.

The data belongs to around 18,000 different research studies carried out all over the world, with the most common type of study being related to cancer research. More than 25,000 scientists from academia and industry have requested access to the data within the repository since its creation in the year 2010.

A crucial element in meeting the ever-increasing global demand for EGA data lies in the high-performance computing (HPC) resources at the BSC. The centre’s infrastructure processes and distributes thousands of data requests from researchers worldwide, distributing huge volumes of data.

“By leveraging our HPC resources, we can efficiently cache, encrypt, and deliver the most actively used datasets, even at peak demand,” explains Sergi Girona Operations Director at BSC. “In just the second quarter of 2024, we distributed a data volume roughly equivalent to 9,000 times the text in all 57 million pages of the English Wikipedia, demonstrating that robust computing capacity is the backbone of large-scale data sharing for the benefit of global biomedical research.”

The future of Federated EGA

Looking ahead, the Federated EGA plans to broaden its scope beyond genomic data to include clinical records, imaging studies, proteomic profiles, and even environmental information, also known as ‘multi-omics’ data.

"The continuous development of Federated EGA plays a crucial role in advancing European data infrastructures such as GDI (European Genomic Data Infrastructure) and EUCAIM (European Federation for Cancer Images). These efforts are vital for creating robust digital ecosystems that facilitate cross-border sharing of health data, accelerating scientific breakthroughs and enhancing AI applications in healthcare," commented Dr. Salvador Capella-Gutierrez, Spanish National Bioinformatics Institute (INB) Coordinator at BSC.

He continued, "Adapting to the European Health Data Space (EHDS) regulation represents one of the upcoming milestones for Federated EGA. This includes adopting Secure Processing Environments (SPEs) to enable scientists worldwide to analyse available data in a legally compliant manner. To make this possible, computational centres like BSC will be more essential than ever."

Indeed, this expansion will be key to unlocking the full potential of personalised medicine. By integrating many different data types under a single, secure framework, researchers can achieve a more comprehensive view of disease mechanisms and patient health. This, in turn, will enable more precise diagnoses, targeted treatments, and preventative measures that take into account an individual’s unique genetic background and environmental context.

At the same time, the Federated EGA will continue to grow its worldwide presence by partnering with new countries and research institutions. “Each new node adds unique population data, helping us pinpoint genetic markers of disease more accurately. This is critical for the development of targeted therapies and tailored preventative measures, accelerating the arrival of personalised medicine on a global scale,” concludes Dr. Luis Serrano, ICREA Research Professor and Director of the Centre for Genomic Regulation (CRG) in Barcelona.