Using computers to understand the language of diseases: capabilities and challenges

8 June 2015


Reported by Henry Rex, CSaP Policy & Communications Officer.

The eighth meeting of CSaP's Policy Leaders Fellowship focused on 'Infection', with a talk by Dr Nigel Collier.

The beginning of June saw the eighth meeting of CSaP’s Policy Leaders Fellowship, which brought together senior policy makers with senior academics to explore issues of mutual interest. The focus of this meeting’s discussion was infection.

The discussion got underway with a talk from Dr Nigel Collier on “Using computers to understand
the language of diseases
”. Dr Collier is a Principal Research Associate in the Language Technology Laboratory, within the Department of Theoretical and Applied Linguistics here at the University of Cambridge. He spent 12 years leading the natural language processing laboratory at the Japanese National Institute of Informatics, served as Technical Advisor to the Early Alerting and Reporting working group of the Global Health Security Action Group, and then spent 2 years as a Marie Curie Fellow in the European Bioinformatics Institute, before coming to Cambridge to co-found the Language Technology Lab.

The locus of his research is the interface of health and artificial intelligence, and his talk focused on how intelligent natural language processing and machine learning can enhance the work of life scientists and clinicians.

In recent years there has been an explosion in health data, from a variety of sources. But much of this data is opaque, as it exists only in unstructured natural language forms (e.g. biomedical literature, lab notebooks, clinical notes, diagnostic reports, clinical trials data etc). Because this data is currently in human language forms, we need to develop mechanisms to translate it into a structured semantic representation that computers can analyse. Natural language processing is the technology used to translate data from human languages to computers.

Dr Collier described how harnessing unstructured health data could help develop early warning systems for infectious disease outbreaks, tools for mapping global health, and lead to a better understanding of patients and their concerns. He used the BioCaster project that he worked on while in Japan as an example, and noted how the transport system can contribute to the rapid spread of diseases, using swine flu as a case study (in which 2 cases grew to 12,500 in one month).

Access to early and detailed information can help governments respond to outbreaks more quickly and more effectively. This information can come from lab reports, field workers, GP reports and sentinel networks (such as trends in buying drugs). But monitoring rumours can often be the quickest way to obtain information and a complement to more traditional sources. Biocaster used Web-based text mining of ‘rumours’ (from news articles, social media blogs etc) to develop an early-warning system for infectious disease outbreak, and over the past 5 years it has proven highly effective in providing epidemic intelligence. In a joint study with the Early Alerting and Reporting group it was found that combining open-source public health intelligence systems such as BioCaster could increase detection times by over a week for epidemic Avian influenza when compared to WHO/OIE alerts, although detection rates were still significantly below human expert standards.

The potential for this technology in the public health sector is enormous, but there are technical issues which still need to be overcome. Issues remain around ambiguity, for example geo-locating place names (e.g. Cambridge UK or Cambridge MA) and our tendency for hyperbole and metaphor (e.g. ‘Obama fever’; ‘enthusiasm is infectious’).

Dr Collier closed his talk by highlighting the many aspects of health care that can be informed by NLP health data analysis: Faster Detection of outbreaks; Low cost evidence for drug monitoring; Targeted Risk Communication; Effective Clinical Intervention; and Detailed Disease Mapping being among the most important.

The presentation was followed by a lively debate among the assembled Policy Leaders Fellows.

(Banner image from John Voo via Flickr)