Text mining

Primary tabs

The Biological Text Mining Unit focuses on the application and development of biomedical text mining technologies, which are becoming a key tool for the efficient exploitation of information, contained in unstructured data repositories including the scientific literature, electronic health records (EHRs), patents, biobank metadata, clinical trials and social media. The unit has a particular interest in processing clinical documents written in Spanish and other co-official languages in the area of health-related topics and the integration of molecular and biological information derived from the literature. Figure 1 provides a general overview of various text mining and language processing tasks. 

The unit is fully funded through the “Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (PITL)”, in the framework of an agreement (“encomienda”) between the Secretary of State of Telecommunications of the Spanish Ministry of Energy, Tourism and the Digital Agenda (MINETAD) and CNIO.


The strategic goals of the Text Mining Unit are to:
  • Design and develop biomedical language-processing resources with emphasis on oncology.
  • Provide consultancy and technical advice for language technologies in the biomedical domain.
  • Design requirements and standards for interoperability of biomedical language technologies.
  • Coordinate community assessment and evaluation challenges of biomedical text mining tasks.
  • Leveraging the uptake of biomedical text mining technologies and relevant standards.

One of the main scopes of the unit is to provide biomedical text mining and language processing infrastructures that can be maintained efficiently over time and be integrated in biomedical analysis platforms comprising data from experimental outcomes of patient-derived information.