Data Engineer for Language and Translation Technologies (RE2)

Job Reference

234_24_LS_LT_RE2

Position

Data Engineer for Language and Translation Technologies (RE2)

Fecha de cierre

Viernes, 17 Mayo, 2024
Reference: 234_24_LS_LT_RE2
Job title: Data Engineer for Language and Translation Technologies (RE2)

About BSC

The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful supercomputers in Europe, was a founding and hosting member of the former European HPC infrastructure PRACE (Partnership for Advanced Computing in Europe), and is now hosting entity for EuroHPC JU, the Joint Undertaking that leads large-scale investments and HPC provision in Europe. The mission of BSC is to research, develop and manage information technologies in order to facilitate scientific progress. BSC combines HPC service provision and R&D into both computer and computational science (life, earth and engineering sciences) under one roof, and currently has over 900 staff from 55 countries.

Look at the BSC experience:
BSC-CNS YouTube Channel
Let's stay connected with BSC Folks!

We are particularly interested for this role in the strengths and lived experiences of women and underrepresented groups to help us avoid perpetuating biases and oversights in science and IT research. In instances of equal merit, the incorporation of the under-represented sex will be favoured.

Context And Mission

The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government to develop essential open-source resources and technologies for Spanish and Catalan. In connection with this, the LT Unit is currently in charge of two flagship projects at the national and regional levels: the Spanish National Plan for the Advancement of Language Technology, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Unit participates in various EU-funded international projects.

The LT Unit at BSC is looking for a Data Engineer with experience in Natural Language Processing and/or Machine translation.

The successful candidate will work in a highly sophisticated HPC environment, have access to state-of-the-art systems and computational infrastructures, and establish collaborations with experts in different areas at the local and international levels.

Key Duties

  • Collect language data as required by the projects carried out in the Unit.
  • Prepare language data processing scripts to clean and prepare data to be ingested by the neural architectures.
  • Automatically annotate data using state-of-the-art language processing tools.
  • Manage corpora and language data according to the requirements specified in the Unit’s data management plan.
  • Monitor applications of data protection, licensing and security rules.
  • Control the quality of collected data and metadata.
  • Coordinate with machine learning engineers to determine data requirements
  • Write technical reports and project documentation in English, Spanish and Catalan.
  • Prepare research proposals and write scientific papers.
  • Coordinate external teams for data collection and data annotation
  • Ensure the applicability of open licenses to data sets, and resolve queries

Requirements

  • Education
    • Degree in Applied linguistics, Computer Science or related disciplines
  • Essential Knowledge and Professional Experience
    • Demonstrated experience in NLP, MT or Speech processing fields.
    • Excellent understanding of data administration and management functions (transfer, storage, analysis, distribution, exploration, etc.).
    • Proven experience in working with large datasets and distributed file systems: SQL, databases and metadata management.
    • Proven experience in UNIX/LINUX environments, scripting languages and Python Competences
  • Additional Knowledge and Professional Experience
    • Demonstrated experience in developing open-source software and resources
    • Fluent in written and spoken English, Spanish and Catalan.
    • Strong understanding of linguistic concepts.
  • Competences
    • Ability to work independently and in a team to complete tasks on schedule.
    • Ability to work under set deadlines.

Conditions

  • The position will be located at BSC within the Life Sciences Department
  • We offer a full-time contract (37.5h/week), a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, restaurant tickets, private health insurance, support to the relocation procedures
  • Duration: Open-ended contract due to technical and scientific activities linked to the project and budget duration
  • Holidays: 23 paid vacation days plus 24th and 31st of December per our collective agreement
  • Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona
  • Starting date: 01/06/2024

Applications procedure and process

All applications must be made through BSC website and contain:

  • A full CV in English including contact details
  • A Cover Letter with a statement of interest in English, including two contacts for further references - Applications without this document will not be considered

    In accordance with the OTM-R principles, a gender-balanced recruitment panel is formed for every vacancy at the beginning of the process. After reviewing the content of the applications, the panel will start the interviews, with at least one technical and one administrative interview. A profile questionnaire as well as a technical exercise may be required during the process.

    The panel will make a final decision and all candidates who had contacts with them will receive a feedback with details on the acceptance or rejection of their profile.

    At BSC we are seeking continuous improvement in our recruitment processes, for any suggestions or feedback/complaints about our Recruitment Processes, please contact recruitment [at] bsc [dot] es.

    For more information follow this link

  • Deadline

    The vacancy will remain open until a suitable candidate has been hired. Applications will be regularly reviewed and potential candidates will be contacted.

    OTM-R principles for selection processes

    BSC-CNS is committed to the principles of the Code of Conduct for the Recruitment of Researchers of the European Commission and the Open, Transparent and Merit-based Recruitment principles (OTM-R). This is applied for any potential candidate in all our processes, for example by creating gender-balanced recruitment panels and recognizing career breaks etc.
    BSC-CNS is an equal opportunity employer committed to diversity and inclusion. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or any other basis protected by applicable state or local law.
    For more information follow this link

    Application Form

    please choose one of this and if needed describe the option : - BSC Website - Euraxess - Spotify - HiPeac - LinkedIn - Networking/Referral: include who and how - Events (Forum, career fairs): include who and how - Through University: include the university name - Specialized website (Metjobs, BIB, other): include which one - Other social Networks: (Twitter, Facebook, Instagram, Youtube): include which one - Other (Glassdoor, ResearchGate, job search website and other cases): include which one
    Please, upload your CV document using the following name structure: Name_Surname_CV
    Los archivos deben ser menores que 3 MB.
    Tipos de archivo permitidos: txt rtf pdf doc docx.
    Please, upload your CV document using the following name structure: Name_Surname_CoverLetter
    Los archivos deben ser menores que 3 MB.
    Tipos de archivo permitidos: txt rtf pdf doc docx zip.
    Please, upload your CV document using the following name structure: Name_Surname_OtherDocument
    Los archivos deben ser menores que 10 MB.
    Tipos de archivo permitidos: txt rtf pdf doc docx rar tar zip.
    ** Consider that the information provided in relation to gender and nationality will be used solely for statistical purposes.