Aitor Gonzalez Aguirre

Primary tabs

Biography

Aitor Gonzalez-Agirre is team leader of Language Modelling in the Language Technologies Unit of the Barcelona Supercomputer Center - Centro Nacional de Supercomputacion. This team is responsible for the development of Large Language Models and its applications in Natural Language Processing tasks.

He received his M.S. degree and his Ph.D in Natural Language Processing from the University of the Basque Country (UPV/EHU). He received the 2017 Best Thesis Award at the SEPLN Conference (Sociedad Española para el Procesamiento del Lenguaje Natural) held in Sevilla from 19-21 September 2018.

He is part of the Plan the Impulso de las Tecnologías del Lenguajes (PlanTL) and AINA projects, two plans to promote the number, quality and availability of language infrastructures in Spanish, Catalan and other spanish co-official languages. He has worked in the biomedical and legal domain, as a member of the Text Mining Unit of the BSC. He has previous experience organizing popular tasks, such as Semantic Textual Similarity (STS) at SemEval/*SEM 2012, 2013, 2014, 2015 and 2016, including other related tasks including Typed-Similarity at SemEval-2013 and Interpretable STS at SemEval-2015 and 2016, Biomedical Abbreviation Recognition and Resolution 2nd Edition (BARR2) at IberEval 2018, Medical Document Anonymization task (MEDDOCAN) at IberLEF 2019, Pharmacological Substances, Compounds and proteins and Named Entity Recognition track (PharmaCoNER) at BioNLP-OST 2019, and MESINESP and CodiEsp Tasks at CLEF-2020.

More recently, he has been focused on the development of Large Language Models. As part of the Language Technologies Unit, he has been involved in the creation of language models for Spanish and Catalan, but also for biomedical and legal domains. He received the Archiletras Innovation Award (2022) for the work "MarIA: Spanish Language Models".

He has also been involved in the creation and enrichment of multilingual lexical knowledge bases and other resources such as the Multilingual Central Repository 3.0 (MCR), the eXtended WordNet Domains (XWD), the MeSpEN resource or the Medical and Legal Word Embeddings for Spanish and Catalan.

Educación

  • Ph.D in Computer Science, University of the Basque Country (July 2017).
  • M.S. in Languague Analysis and Processing (September 2012)
  • M.S. in Advanced Computer Systems (September 2011)
  • B.S in Computer Science (June 2010)