There is a deluge of health-related texts in many genres, from the clinical narrative to newswire and social media. These texts are diverse in content, format, and style, and yet they represent complementary facets of biomedical and health knowledge. Natural Language Processing (NLP) holds much promise to extract, understand, and distill valuable information from these overwhelming large and complex streams of data, with the ultimate goal to advance biomedicine and impact the health and wellbeing of patients. There have been a number of success stories in various biomedical NLP applications, but the NLP methods investigated are usually tailored to one specific phenotype and one institution, thus reducing portability and scalability. Moreover, while there has been much work in the processing of clinical texts, other genres of health texts, like narratives and posts authored by health consumers and patients, are lacking solutions to marshal and make sense of the health information they contain. Robust NLP solutions that answer the needs of biomedicine and health in general have not been fully investigated yet. A unified, data-science approach to health NLP enables the exploration of methods and solutions unprecedented up to now. Our vision is to unravel the information buried in the health narratives by advancing text-processing methods in a unified way across all the genres of texts. The crosscutting theme is the investigation of methods for health NLP (hNLP) made possible by big data, fused with health knowledge. Our proposal moves the field into exploring semi-supervised and fully unsupervised methods, which only succeed when very large amounts of data are leveraged and knowledge is injected into the methods with care. Our hNLP proposal also targets a key challenge of current hNLP research: the lack of shared software. We seek to provide a clearinghouse for software created under this proposal, and as such all developed tools will be disseminated. Starting from the data characteristics of health texts and information needs of stakeholders, we will develop and evaluate methods for information extraction, information understanding. We will translate our research into the publicly available NLP software platform cTAKES, through robust modules for extraction and understanding across all genres of health texts. We will also demonstrate impact of our methods and tools through several use cases, ranging from clinical point of care to public health, to translational and precision medicine, to participatory medicine. Finally, we will disseminate our work through community activities, such as challenges to advance the state of the art in health natural language processing.
There is a deluge of health texts. Natural Language Processing (NLP) holds much promise to unravel valuable information from these large data streams with the goal to advance medicine and the wellbeing of patients. We will advance state-of-the-art NLP by designing robust, scalable methods that leverage health big data, demonstrating relevance on high-impact use cases, and disseminating NLP tools for the research community and public at large.
Showing the most recent 10 out of 12 publications