AINA project goes on tour through Catalan-speaking territories to add voices from all dialectal variants

04 April 2022
The tour starts today in Perpignan, Northern Catalonia, and will pass through the Valencian Community and the Balearic Islands on April 7 and 11.

Vice President Puigneró will lead the project presentation events with the main cultural and language defense organizations of each of the territories

The objective is to add the voices of the variants spoken in these territories in the 'Our language is your voice' campaign to obtain a voice corpus that includes all the linguistic diversity of oral Catalan

AINA is a project based on data technologies and Artificial Intelligence promoted by the Vice Presidency and the Barcelona Supercomputing Center (BSC)

Today, the AINA project is launching a tour through Catalan-speaking territories with the aim of adding voices from the different dialectal variants of the Catalan language.

According to the latest available data, the dialectal variants least represented on Mozilla's Common Voice platform are Balearic (1%), North-Western (1%), Northern (3%) and Valencian (5%). AINA's tour aims to encourage speakers of these variants to join the voice collection campaign through the website, with the ultimate goal of significantly increasing these percentages to ensure that the resulting voice corpus covers the full linguistic diversity of the Catalan language.

A two-phase tour

Promoted by the Department of the Vice-Presidency and Digital Policies and Territory with the collaboration of different cultural and language defence organisations and associations from the Catalan-speaking territories, AINA's tour will take place in two phases. The first will stop in Catalunya Nord, the Valencian Community and the Balearic Islands.

The kick-off will take place today, 4 April, in Perpignan (Catalunya Nord), with a presentation of the project led by the vice-president Jordi Puigneró and the collaboration of the Pyrénées-Orientales Departmental Council and the Public Office of the Catalan Language (OPLC). The same week, on Thursday 7 April, the tour will stop in the city of Valencia (País Valencià), where a presentation event will be hosted and promoted in collaboration with Acció Cultural del País Valencià. Finally, on 11 April, the AINA tour will also pass through Palma (Mallorca), with an event organised in collaboration with the Government of the Balearic Islands.

In these three events, both participants and attendees will be able to record their voices at a travelling voice collection point of the project.

The AINA tour will continue before summertime with a second phase that will cover the whole of Catalonia, with events to present the project in a municipality in each of the eight Catalan vegueries.

More than 640,000 recorded voices

Last mid-February, the Government of Catalonia launched the campaign 'Our language is your voice' within the framework of the AINA project with the aim of collecting as many voices as possible to feed content into the first version of the voice corpus of the Catalan, essential to able machines to understand and speak our language.

The campaign invites Catalan-speaking citizens of all ages, genders, conditions and origins to "give" their voice through the website, where everyone can read, record and validate an unlimited number of sentences grouped 5 by 5 on Mozilla's Common Voice platform.

The excellent acceptance of the campaign by the citizens has meant that, in just over a month, recording numbers and hours have been reached that have exceeded the best expectations, currently reaching more than 640,000 voice cuts (phrases) recorded and more than 900 hours of recording since the start of the campaign. In addition, Catalan has become the second language in the world with the most speakers on Common Voice (more than 25,500 people), only behind English.

However, the dominant dialectal variant on the platform is still Central Catalan, which exceeds 75% of recorded voices.

About the AINA project

Promoted by the Department of the Vice Presidency and of Digital Policies and Territory in collaboration with the Barcelona Supercomputing Center (BSC), AINA is a project based on data technologies and Artificial Intelligence to make it possible for machines to understand and speak Catalan.

This is a strategic project whose ultimate goal is to teach Catalan to machines so that citizens can interact with them and participate in the digital world in Catalan at the same level as speakers of a global language, such as English now, avoiding thus the digital extinction of the Catalan language.

In this sense, the AINA project is building the Catalan corpus and language models to make it easier for technology companies to develop their specific solutions or services (translators, personal assistants, voice synthesizers, text classifiers, etc.) in our language.

The AINA project is part of the Government's digital strategy through two initiatives led by the Vice Presidency: the Artificial Intelligence Strategy of Catalonia (Catalonia.AI) and the Interdepartmental Board of Directors for the promotion of Catalan on the Internet and in technologies advanced digital.