BSC to develop multilingual models in Aranese through Aina

17 January 2024

The Institute of Aranese Studies - Aranese Academy of Occitan Language (IEA-AALO) will provide speech, text and metadata data to the BSC, which will be integrated into the corpus of the Aina project

The Barcelona Supercomputing Center – Centro Nacional de Supercomputación (BSC-CNS) and the Aranesi Institute of Studies - Academia Aranesa dera Lengua Occitana (IEA-AALO) have reached an agreement to develop artificial intelligence models, for the first time, also in Aranese. To this end, the IEA-AALO will transfer voice, text and metadata data to the BSC.

The agreement between both centres is a key milestone for the incorporation of the Occitan language into the artificial intelligence systems developed by the Aina Project. The project is coordinated through the BSC's Language Technologies Unit.

The collaboration foresees that the BSC will be able to integrate these data into the Aina corpus. The datasets that are available at the Hugging Face are fundamental for the training of models and Language Technologies (TL).

For the IEA-AALO, it is a “step that can entail an important advance for the development of technologies in Occitan language that can facilitate the study and linguistic analysis as well as a greater dissemination and promotion of the language through texts writing applications or automatic correction, among others”, according to Jèp de Montoya, President of the IEA-AALO.

The Project Aina leaded by the BSC and funded by the Generalitat de Catalunya, thus expands its range of collaborations, beyond Catalan. Its strategic vision positions the initiative as a space for the promotion of languages with few digital resources.


All the information about the project and the development of the models and datasets is available the Aina Tech project website.