Extended Methods and Software Development for Health NLP

Elhadad, Noemie; Savova, Guergana

Abstract

There is a deluge of health-related texts in many genres, from the clinical narrative to newswire and social media. These texts are diverse in content, format, and style, and yet they represent complementary facets of biomedical and health knowledge. Natural Language Processing (NLP) holds much promise to extract, understand, and distill valuable information from these overwhelming large and complex streams of data, with the ultimate goal to advance biomedicine and impact the health and wellbeing of patients. There have been a number of success stories in various biomedical NLP applications, but the NLP methods investigated are usually tailored to one specific phenotype and one institution, thus reducing portability and scalability. Moreover, while there has been much work in the processing of clinical texts, other genres of health texts, like narratives and posts authored by health consumers and patients, are lacking solutions to marshal and make sense of the health information they contain. Robust NLP solutions that answer the needs of biomedicine and health in general have not been fully investigated yet. A unified, data-science approach to health NLP enables the exploration of methods and solutions unprecedented up to now. Our vision is to unravel the information buried in the health narratives by advancing text-processing methods in a unified way across all the genres of texts. The crosscutting theme is the investigation of methods for health NLP (hNLP) made possible by big data, fused with health knowledge. Our proposal moves the field into exploring semi-supervised and fully unsupervised methods, which only succeed when very large amounts of data are leveraged and knowledge is injected into the methods with care. Our hNLP proposal also targets a key challenge of current hNLP research: the lack of shared software. We seek to provide a clearinghouse for software created under this proposal, and as such all developed tools will be disseminated. Starting from the data characteristics of health texts and information needs of stakeholders, we will develop and evaluate methods for information extraction, information understanding. We will translate our research into the publicly available NLP software platform cTAKES, through robust modules for extraction and understanding across all genres of health texts. We will also demonstrate impact of our methods and tools through several use cases, ranging from clinical point of care to public health, to translational and precision medicine, to participatory medicine. Finally, we will disseminate our work through community activities, such as challenges to advance the state of the art in health natural language processing.

Public Health Relevance

There is a deluge of health texts. Natural Language Processing (NLP) holds much promise to unravel valuable information from these large data streams with the goal to advance medicine and the wellbeing of patients. We will advance state-of-the-art NLP by designing robust, scalable methods that leverage health big data, demonstrating relevance on high-impact use cases, and disseminating NLP tools for the research community and public at large.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM114355-03
Application #: 9421556
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Ravichandran, Veerasamy

Project Start: 2016-01-01
Project End: 2019-12-31
Budget Start: 2018-01-01
Budget End: 2018-12-31
Support Year: 3
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 621889815

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects


NIH 2019 R01 GM	Extended Methods and Software Development for Health NLP Elhadad, Noemie; Savova, Guergana K. / Columbia University (N.Y.)
NIH 2018 R01 GM	Extended Methods and Software Development for Health NLP Elhadad, Noemie; Savova, Guergana K. / Columbia University (N.Y.)
NIH 2017 R01 GM	Extended Methods and Software Development for Health NLP Elhadad, Noemie; Savova, Guergana K. / Columbia University (N.Y.)	$718,945
NIH 2016 R01 GM	Extended Methods and Software Development for Health NLP Elhadad, Noemie; Savova, Guergana K. / Columbia University (N.Y.)	$844,963

Publications

Osborne, John D; Neu, Matthew B; Danila, Maria I et al. (2018) CUILESS2016: a clinical corpus applying compositional normalization of text mentions. J Biomed Semantics 9:2

Xu, Dongfang; Yadav, Vikas; Bethard, Steven (2018) UArizona at the MADE1.0 NLP Challenge. Proc Mach Learn Res 90:57-65

Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra et al. (2018) Clinical Natural Language Processing in languages other than English: opportunities and challenges. J Biomed Semantics 9:12

Zhang, Shaodian; Grave, Edouard; Sklar, Elizabeth et al. (2017) Longitudinal analysis of discussion topics in an online breast cancer community using convolutional neural networks. J Biomed Inform 69:1-9

Zhang, Shaodian; Kang, Tian; Qiu, Lin et al. (2017) Cataloguing Treatments Discussed and Used in Online Autism Communities. Proc Int World Wide Web Conf 2017:123-131

Zhang, Shaodian; Qiu, Lin; Chen, Frank et al. (2017) ""We make choices we think are going to save us"": Debate and stance identification for online breast cancer CAM discussions. Proc Int World Wide Web Conf 2017:1073-1081

Gonzalez-Hernandez, G; Sarker, A; O'Connor, K et al. (2017) Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. Yearb Med Inform 26:214-227

Zhang, Shaodian; O'Carroll Bantum, Erin; Owen, Jason et al. (2017) Online cancer communities as informatics intervention for social support: conceptualization, characterization, and impact. J Am Med Inform Assoc 24:451-459

Sadeque, Farig; Xu, Dongfang; Bethard, Steven (2017) UArizona at the CLEF eRisk 2017 Pilot Task: Linear and Recurrent Models for Early Depression Detection. CEUR Workshop Proc 1866:

Zhang, Shaodian; Elhadad, Noémie (2016) Factors Contributing to Dropping-out in an Online Health Community: Static and Longitudinal Analyses. AMIA Annu Symp Proc 2016:2090-2099

Showing the most recent 10 out of 12 publications

Comments

Be the first to comment on Noemie Elhadad's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: