Aina launches the first speech solution to support the different varieties of Catalan

23 April 2024
The new Matxa model is now available to test and run on the Hugging Face open-source AI platform

The Aina artificial intelligence and language technologies project celebrates Sant Jordi by publishing the first voice synthesis model in the main dialect variants of Catalan, named Matxa. This is the first technological solution published as an open linguistic model that offers text-to-speech (TTS) interpretation in central, north-western, Balearic and Valencian Catalan. The Aina project is promoted and financed by the Generalitat de Catalunya.

All users can access the model available on Hugging Face, the AI community with open-source resources, from where it can be tested and executed. The technology developed by the Language Technologies Unit of the Barcelona Supercomputing Center – Centro Nacional de Supercomputación (BSC-CNS) is trained with different datasets, among which is Festcat, OpenSLR69 or the recently created Frescat, which includes records in four dialect variants and 8 different speakers.

Matxa represents a step forward in terms of performance since it maintains the naturalness and characteristics of the voices chosen to train it. For its composition, it is based on the combination of the Matcha-TTS and Vocos architectures, which stand out for their novelty and very low execution time, through neural networks. The dialect system has been configured and trained using the new supercomputer MareNostrum 5 and FinisTerrae III of the Galician Supercomputing Center (CESGA).

Through the public demo you can try a first test of how Matxa works:

The new Fescat dataset is a pioneering development in the field of digital resources in Catalan, as it incorporates up to 8 speakers with different characteristics. In total, two voices for each of the main dialects. The dataset will be made public in the coming weeks and will be available for download and use to all users. For the BSC researcher, specialized in voice, Baybars Külebi, it is “an innovative resource that makes digital resources available to everyone that take into account the plurality of Catalan.”

The development of speech synthesis technologies opens the door to a large volume of possible applications. In fact, the Aina project, through the BSC, already works with companies and institutions to offer specific solutions using artificial intelligence tools developed at the center.