BSC's MarIA project wins the Archiletras de la Lengua award for innovation

15 July 2022

It is a massive artificial intelligence system, expert in understanding and writing in Spanish, created from the digital documentary heritage of the National Library of Spain.

The MarIA project, the language modeling system created at the Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) from the web archives of the National Library of Spain (BNE), and framed and funded by the Language Technologies Plan of the State Secretariat for Digitalization and Artificial Intelligence (SEDIA), has been awarded the Innovation Prize at the first edition of the Archiletras de la Lengua awards.

The annual Archiletras awards, organized by the publishing house Prensa y Servicios de la Lengua, recognize merits in the promotion, support, research and development of the Spanish language or any of the other languages in contact with Spanish in any of its territorial spheres.

The award was presented last Thursday at Casa América in Madrid, in a ceremony in which the BSC was represented by Marta Villegas, head of the project and leader of the BSC's Text Mining Unit, who received the award from Carme Artigas, Secretary of State for Digitalization and Artificial Intelligence.

"Receiving the Archiletras award makes us especially happy. It is an honor and a recognition to the team of enthusiastic professionals at the BSC who, in collaboration with the BNE and SEDIA, have worked to ensure that Spanish has sufficient and quality language resources," said Marta Villegas.

MarIA won in the jury's final vote over the other two finalists, the mobile application Dialectos del Español, designed to detect and predict general and characteristic features of all the dialects of the Spanish-speaking world, and Euskal Herriko Ahotsak (Voices of the Basque Country), a project that compiles and disseminates the Basque oral and dialectal cultural heritage.

MarIA places the Spanish language among the languages with massive open access models.

The MarIA project is a massive artificial intelligence system with expertise in understanding and writing in Spanish. Due to its volume and capabilities, it has placed the Spanish language among the group of languages with massive open access models, after English and Mandarin.

A language model is an artificial intelligence system formed by a deep neural network trained to acquire an understanding of the language, its lexicon and its mechanisms for expressing meaning and writing like a human.

These complex statistical models, which relate words in texts in a systematic and massive way, are able to "understand" not only abstract concepts, but also their context. With these models, developers of different applications can create tools for multiple uses, such as classifying documents or creating proofreaders or translation tools.

MarIA has been built from the BNE's digital documentary heritage, which crawls and archives Spanish-language websites, and has been trained with the BSC's MareNostrum 4 supercomputer. It is published openly so that application developers, companies, research groups and society in general can use it for an infinite number of purposes.

MarIA's latest advances are a milestone in the achievement of the objectives of the National Artificial Intelligence Strategy and the Recovery, Transformation and Resilience Plan, with which Spain aims to lead the world in the development of tools, technologies and applications for the projection and use of the Spanish language in the fields of application of AI.

MarIA is also linked to the Strategic Project for the Recovery and Economic Transformation (PERTE) New Economy of the Language, proposed as an opportunity to take advantage of the potential of Spanish and the co-official languages as a factor for economic growth and international competitiveness in areas such as artificial intelligence, translation, learning, cultural dissemination, audiovisual production, research and science.