SORS: Algorithms, Software, and Hardware Accelerators for the Next Wave of Genomic Data

Fecha: 26/Jun/2025 Time: 10:00

Place:

[HYBRID] Room 1-3-2, BSC Main Building and Online via Zoom

2025-06-26 10:00:00 2025-06-26 10:00:00 Europe/Madrid SORS: Algorithms, Software, and Hardware Accelerators for the Next Wave of Genomic Data For details, click on the following event link: https://www.bsc.es/es/research-and-development/research-seminars/sors-algorithms-software-and-hardware-accelerators-the-next-wave-genomic-data ---

Primary tabs

AI4S registration link

Abstract

In this talk, I will discuss how emerging fields such as pangenomics, pathogen surveillance, wastewater epidemiology, comparative genomics, and metagenomics are resulting in new waves of genomic data and applications. I will also discuss the various computational and storage challenges this data presents and how at Turakhia lab, we are using a combination of new algorithms, software, FPGA, GPU, and high-performance computing (HPC) solutions to address them.

In pangenomics, we introduced PanMAN, a compact and unified data representation that integrates phylogeny, mutational history, genomic variation, and whole-genome alignments—making it the first of its kind. PanMAN was used to construct the largest pangenome for SARS-CoV-2 currently available, of over 8 million sequences, which requires only 366MB of disk space. This was enabled in part by TWILIGHT, our GPU-accelerated multiple sequence aligner that offers orders-of-magnitude speedups and scales far beyond existing tools.
For pathogen surveillance, we developed the UShER toolkit, which enabled real-time SARS-CoV-2 genomic surveillance and epidemiological research at a global scale during the COVID-19 pandemic, and has contributed to the designation of over 4,000 lineages. Building on UShER, we recently created WEPP, a novel HPC tool that significantly enhances the resolution and timeliness of wastewater-based epidemiology, and is enabling powerful new applications.

In comparative genomics, we developed ROADIES, an HPC software that fully automates accurate species tree inference from raw genome assemblies. ROADIES is transforming large-scale phylogenetic studies and is currently being used to analyze assemblies from the Vertebrate Genomes Project (VGP). Lastly, I will share my vision for how hardware accelerators can drive the next wave of innovation in bioinformatics, including some of our work based on high-level synthesis.

Short Bio
Dr. Yatish Turakhia is an Assistant Professor in the Department of Electrical and Computer Engineering at the University of California San Diego (UCSD), with affiliations in the Department of Computer Science and Engineering (CSE) and the Bioinformatics and Systems Biology (BISB) graduate program. Prior to joining UCSD, he was a postdoctoral scholar at the Genomics Institute, UC Santa Cruz. Dr. Turakhia earned his Ph.D. in Electrical Engineering from Stanford University in 2019 and his bachelor’s and master’s degrees in Electrical Engineering from the Indian Institute of Technology (IIT) Bombay in 2014. He is a recipient of the MIT Technology Review’s Innovators Under 35 award, Hellman Fellowship, Jacobs Early Career Award, Amazon Research Award, NVIDIA Graduate Fellowship, and multiple paper awards.
 

Speakers

Speaker: Dr. Yatish Turakhia. Assistant Professor in the Department of Electrical and Computer Engineering at the University of California San Diego (UCSD)
Host: Miquel Moreto and Santiago Marco Sola