It is estimated that almost one-half of Americans suffer from chronic diseases, yet epidemiologic investigations are limited by the difficulty of ascertaining disease status at scale, even in the era of electronic medical records (EMRs). For example, algorithms based on structured data (e.g., ICD-9 codes) for asthma lack the sensitivity required for population-based studies, while manual medical record reviews of EMRs are labor-intensive and thus inefficient for population-scale ascertainment of disease status. The lack of efficient ways to ascertain disease status has severely restricted the scope of investigation for chronic diseases such as asthma. Furthermore, there is a temporal progression of a patient's true disease status, and this may not be reflected in the clinical diagnosis of that disease. We previously reported that two-thirds of children with asthma had a delay in their diagnosis (median: 3.3 years), with subsequent conditions like remission or relapse largely unreported. Such information about disease progression may be recorded during manual medical record review, but, again, manual review limits investigations and conclusions to small-scale studies. Our long term goal is to accelerate epidemiological investigations of chronic diseases and their temporal progression by streamlining medical record review. The main goal of this proposal is to extend a preliminary NLP-based system for asthma status ascertainment by identifying time-situated classifications of asthma onset, remission, and relapse. We will validate this system in a population health setting and release it as an open-source tool. We hypothesize that NLP methods in the EMR allow us to ascertain asthma status and to track asthma disease progression with greater accuracy and efficiency than conventional approaches (billing codes or manual medical record review).
In Aim 1, we will extend our preliminary NLP system to ascertain the patient-level disease progression of asthma. Most significantly, we will ascertain time-situated asthma remission and relapse, two important events in the natural history of asthma. We will also improve methods of aggregating events, employ temporal expression and relation extraction, include structured data sources, and implement automatic feature selection.
In Aim 2, we will evaluate the NLP system for its accuracy in ascertaining asthma onset, relapse, and remission. We will also verify the epidemiological (construct) validity against existing studies, and disseminate the NLP system as an open-source project, Adept (Aggregation of Disease Evidence for Patient Timelines). Expected Outcomes: The proposed NLP system will: (i) orient clinical NLP techniques toward time-situated patient-level solutions; (ii) expand the scale of research capabilities for asthma; and (iii) provide a basis for decision support and other applications. Successful completion of this project would provide an open-source tool for ascertaining the disease progression of asthma with a general approach to aggregating evidence.

Public Health Relevance

Asthma is the most common chronic condition in children, yet studying asthma in large populations is difficult because it is costly to read through medical records to determine who has asthma and when they got it. We aim to use electronic medical records (especially clinical notes) to efficiently determine who has asthma and track its progression over time. An algorithm that tracks disease status over time will potentially enable further research on chronic diseases, and will ultimately improve patient care.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21AI116839-02
Application #
8995191
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Minnicozzi, Michael
Project Start
2015-01-15
Project End
2017-12-31
Budget Start
2016-01-01
Budget End
2017-12-31
Support Year
2
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Mayo Clinic, Rochester
Department
Type
DUNS #
006471700
City
Rochester
State
MN
Country
United States
Zip Code
55905
Sohn, Sunghwan; Wi, Chung-Il; Wu, Stephen T et al. (2018) Ascertainment of asthma prognosis using natural language processing from electronic medical records. J Allergy Clin Immunol 141:2292-2294.e3
Wu, Stephen; Liu, Sijia; Sohn, Sunghwan et al. (2018) Modeling asynchronous event sequences with RNNs. J Biomed Inform 83:167-177
Yawn, Barbara P; Wollan, Peter C; Rank, Matthew A et al. (2018) Use of Asthma APGAR Tools in Primary Care Practices: A Cluster-Randomized Controlled Trial. Ann Fam Med 16:100-110
Wi, Chung-Il; Sohn, Sunghwan; Ali, Mir et al. (2018) Natural Language Processing for Asthma Ascertainment in Different Practice Settings. J Allergy Clin Immunol Pract 6:126-131
Kaur, Harsheen; Sohn, Sunghwan; Wi, Chung-Il et al. (2018) Automated chart review utilizing natural language processing algorithm for asthma predictive index. BMC Pulm Med 18:34
Voge, Gretchen A; Carey, William A; Ryu, Euijung et al. (2017) What accounts for the association between late preterm births and risk of asthma? Allergy Asthma Proc 38:152-156
Sohn, Sunghwan; Wang, Yanshan; Wi, Chung-Il et al. (2017) Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions. J Am Med Inform Assoc :
Wi, Chung-Il; Sohn, Sunghwan; Rolfes, Mary C et al. (2017) Application of a Natural Language Processing Algorithm to Asthma Ascertainment. An Automated Chart Review. Am J Respir Crit Care Med 196:430-437
Ryu, Euijung; Wi, Chung-Il; Crow, Sheri S et al. (2016) Assessing health disparities in children using a modified housing-related socioeconomic status measure: a cross-sectional study. BMJ Open 6:e011564