The Health Language Technology group of BSC generates AI and deep-learning based natural language processing resources, including components, annotated data/protocols as well as defining evaluation scenarios to assess and monitor quality of implementations by the global research community through high impact open data benchmark shared tasks (BioCreative/IberLEF/CLEF). The text mining technologies developed by us serve to exploit unstructured data types of practical relevance for data analytics and predictive modelling applications using literature data, clinical records, social media and patents, covering content in English, Spanish and Catalan.

Our main research application scenarios cover HPC-empowered text mining systems developed for use-cases related to occupational health, cancer (incl. comorbidities and tumour morphology), COVID-19, rare diseases as well as gene regulatory networks, biomaterials and chemical entity information extraction, drug-protein target literature mining.