Research Engineer - Fault Tolerance and Reliability for new Computer Architecture Hardware-Software co-design (R2)

Job Reference

125_21_CS_CAPP_R2

Position

Research Engineer - Fault Tolerance and Reliability for new Computer Architecture Hardware-Software co-design (R2)

Closing Date

Saturday, 31 July, 2021
Reference: 125_21_CS_CAPP_R2
Job title: Research Engineer - Fault Tolerance and Reliability for new Computer Architecture Hardware-Software co-design (R2)

 

About BSC
 
The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful supercomputers in Europe, and is a hosting member of the PRACE European distributed supercomputing infrastructure. The mission of BSC is to research, develop and manage information technologies in order to facilitate scientific progress. BSC combines HPC service provision and R&D into both computer and computational science (life, earth and engineering sciences) under one roof, and currently has over 700 staff from 49 countries.

Look at the BSC experience:
BSC-CNS YouTube Channel
Let's stay connected with BSC Folks!
 
Context And Mission
 
BSC is seeking a Research Engineer to work in an exciting research project in the topic of developing fault tolerance capabilities for a new RISC-V processor. The reliability techniques developed would involve working both at the hardware as well as the software level. In particular we target to demonstrate reliability strategies for AI applications. A Ph.D. student could also be considered if the candidate has a strong background in computer fault tolerance and resilience.
 
Key Duties
 
  • Develop runtime support for application-level checkpointing and automate recovery
  • Develop techniques handling heterogeneous hardware, checkpointing both CPUs
    and FPGAs, as well as leveraging 3D stack memory
  • Develop a library API that is able to checkpoint scientific applications, as well as deep learning frameworks
  • Protect the critical processor structures, such as L1 Data and Instruction caches, L2 cache, TLB and register files with distinct error detection functionality (parity or lightweight ECC) according to their vulnerability and the target FIT rates for each structure
  • Writing high-quality technical reports and papers.
 
Requirements
 
  • Education
    • Master in Computer Science or Computer Engineering
  • Essential Knowledge and Professional Experience
    • Previous hardware design experience at RTL level
    • High-speed low-power fault-tolerant digital design techniques
    • Understanding of computer architecture, preferably in high performance computing, processor micro-architecture, memory subsystem, storage
    • Knowledge of basic fault tolerance strategies such as checkpoint restart and error correcting codes
    • Experience with parallel and distributed applications (MPI+openMP)
    • Knowledge with continuous integration systems, and good coding practices
  • Competences
    • Ability to work independently.
    • Ability to establish and develop research collaborations with external stakeholders.
    • Ability to present ideas and results in a precise and succinct way.
 
Conditions
 
  • The position will be located at BSC within the Computer Sciences Department
  • We offer a full-time contract, a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, tickets restaurant, private health insurance, fully support to the relocation procedures
  • Duration: Temporary - 2 years renewable
  • Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona
  • Starting date: asap
 
Applications Procedure
 
All applications must include:

  • A Cover Letter with a statement of interest in English, including two contacts for further references - Applications without this document will not be considered

  • A full CV in English including contact details


  •  
 
Deadline
 
The vacancy will remain open until suitable candidate has been hired. Applications will be regularly reviewed and potential candidates will be contacted.
 
Diversity and Equal Opportunity Employment
 
BSC-CNS is an equal opportunity employer committed to diversity and inclusion. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or any other basis protected by applicable state or local law.

 

Application Form

Please, upload your CV document using the following name structure: Name_Surname_CV
Files must be less than 3 MB.
Allowed file types: txt rtf pdf doc docx.
Please, upload your CV document using the following name structure: Name_Surname_CoverLetter
Files must be less than 3 MB.
Allowed file types: txt rtf pdf doc docx zip.
Please, upload your CV document using the following name structure: Name_Surname_OtherDocument
Files must be less than 10 MB.
Allowed file types: txt rtf pdf doc docx rar tar zip.
** Consider that the information provided in relation to gender and nationality will be used solely for statistical purposes.