Description
The project will build an Open European Family of Large Language Models (OpenEuroLLM), publishing data, training sources and the models in an open way. The model will cover all European official languages and also socially and economically important ones.
The consortium combines the best academic as well as commercial researchers and developers with a proven track record to be able to set up procedures and workflows for this ambitious goal. The LLM will be evaluated using standard LLM metrics and will comply with regulatory requirements, such as the AI Act and other legally binding rules. Data will come from worldwide sources as collected by previous or parallel projects and will be published openly.
The computing power needed to achieve the goals will come from the European HPC networks and centres, which will also provide technical expertise and help. The models and other outputs will be distributed in such a way that it will be easy and cheap to use them for further development finetuning or any other use, especially by SMEs in Europe.
A community will be built to secure further development of LLMs and their commercial use in Europe. Artificial intelligence, Open Source Software, Large Language Models, Open Science, Trustworthy AI