This Career Development Application describes targeted coursework and mentored research for progression to independent research in the use of electronic health record data for disease subtyping. Electronic health records have demonstrated great promise as a scalable source of data for biomedical research to enable ?precision medicine.? Use of natural language processing techniques has enabled computational analysis of specific terms found in free text clinical notes. An improved ability to extract symptom information from clinical notes would improve researchers? ability to use de-identified data from patient records for discovery of disease subtypes. Symptom-related terms are particularly important in the context of mental health, but also harder to detect in notes than other terms like diseases or drug names.
The research aims of this proposal present a novel approach to scalable extension of biomedical terminologies and improved detection of those terms and their modifiers (e.g. severe, familial, absent). The richer dataset that can be extracted using these enhanced approaches is then used to define patient cohorts and to detect disease subtypes and predictors of response to specific pharmaceutical intervention. Resulting patient stratification will be compared to groupings made without the enriched data and validated on an independent data set. The overarching hypothesis of this work is that enhanced mining of clinical notes will enable statistically significant and clinically relevant symptom-based stratification of psychiatric disorders. In order to test this hypothesis, I will:
Aim 1 : Develop a semi-automated pipeline for domain-specific terminology extension Aim 2: Define and stratify patient cohorts through use of enhanced term extraction Aim 3: Evaluate the validity and utility of the richer set of data obtained through Aims 1 and 2 One area of greatest need for more evidence-based disease stratification, and also of greatest challenge for a number of reasons, is that of mental health. Mental health disorders account for 30% of non-fatal disease burden world-wide, and pose an economic burden of trillions of dollars and climbing. Moreover, mental health symptoms are generally subjective and self-reported, with few objectively measurable signs. The impact of this proposal is that it will dramatically improve our ability to use EHR data to stratify patients in this drastically underserved area of health and healthcare. The major innovations of this project are the adaptation and application of a semi-supervised pattern learning pipeline to augment mental health terminologies, and a novel approach to disease stratification using a significantly underutilized source of biomedical data, namely clinical notes. This work addresses a major challenge for mining clinical notes in rapidly evolving biomedical domains and leverages a valuable source of medical evidence that is largely untapped and underutilized. Together, these methods for enhanced use of clinical notes will enable identification of distinct patient subgroups using data that is sitting idle in EHRs.

Public Health Relevance

Electronic health records have demonstrated great promise as a scalable source of data for biomedical research to enable ?precision medicine.? Natural language processing has enabled the use of information from clinical notes in addition to structured data like diagnoses and lab values. This work aims to improve our ability to extract useful information from electronic health records to enable disease subtyping, particularly in the area of mental health.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Scientist Development Award - Research & Training (K01)
Project #
1K01LM012529-01A1
Application #
9453180
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Sim, Hua-Chuan
Project Start
2017-09-16
Project End
2020-08-31
Budget Start
2017-09-16
Budget End
2018-08-31
Support Year
1
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Duke University
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
044387793
City
Durham
State
NC
Country
United States
Zip Code
27705
Tenenbaum, Jessica D; Bhuvaneshwar, Krithika; Gagliardi, Jane P et al. (2017) Translational bioinformatics in mental health: open access data sources and computational biomarker discovery. Brief Bioinform :