IBM-BSC - Natural Language Processing

Natural Language Processing


The IBM-BSC Joint Study Agreement (JSA) on Natural Language Processing (NLP) is motivated by the relevance of language technologies in the clinical domain and the need for high quality linguistic and semantic annotations that can be used by clinical decision support systems and clinical research. The main topic being researched within this project is the industrialization and scale out of NLP components for clinical texts based on cTAKES, a UIMA based framework. cTAKES has been used in a variety of use cases in the domain of biomedicine such as phenotype discovery, translational science, pharmacogenomics and pharmacogenetics and it is a reference framework for clinical NLP in the EEUU. In order to ensure the success of such a reference platform for clinical NLP in Spanish and achieve a quick and wide adoption by the developers of clinical NLP, the proposal first requires two essential tasks:

  1. The development of relevant components for Spanish language so that the new platform is fully equipped with a good number of high quality components that ease the adoption of the platform by the Spanish clinical NLP community, especially companies and hospitals.
  2. The definition, deployment and validation of a cTAKES Java and python bridge that provides ways of interoperability and communication between the UIMA java based platform and the plethora of new python and deep learning libraries  methods.

In addition, BSC will also work generating contextual language models for Spanish general and medical domains and for other Spanish co-official languages using huge amounts of textual data. These pre-trained models will be used to train new ‘task and/or domain-oriented’ models used by the components of the platform.