Most medical decisions are made without the support of rigorous evidence in large part due to the cost and complexity of performing randomized trials for most clinical situations. In practice, clinicians must use their judgement, informed by their own and the collective experience of their colleagues. The advent of the electronic health record (EHR) enables the modern practitioner to algorithmically check the records of thousands or millions of patients to rapidly find similar cases and compare outcomes. In addition to filling the inferential gap in actionable evidence, these kinds of analyses avoid issues of ethics, practicality, and generalizability that plague randomized clinical trials (RCTs). Unfortunately, identifying patients with the appropriate phenotypes, properly leveraging available data to adjust results, and matching similar patients to reduce confounding remain critical challenges in every study that uses EHR data. Overcoming these challenges to improve the accuracy of observational studies conducted with EHR data is of paramount importance. Studies using EHR data begin by defining a set of patients with specific phenotypes, analogous to amassing a cohort for a clinical trial. This process of electronic phenotyping, is typically done via a set of rules defined by experts. Machine learning approaches are increasingly used to complement consensus definitions created by experts and we propose several advances to validate and improve this practice. We will explore and quantify the effects of feature engineering choices to transform the diagnoses, procedures, medications, laboratory tests and clinical notes in the EHR into a computable feature matrix. Finally, building on recent advances, we plan to characterize the performance of existing methods and develop EHR-specific strategies for patient matching. Our work is significant because we will take on three challenging problems--electronic phenotyping, feature engineering, and patient matching--that stand in the way of generating insights via EHR data. If we are successful, we will significantly advance our ability to generate insights from the large amounts of health data that are routinely generated as a byproduct of clinical processes.

Public Health Relevance

The advent of the electronic health record (EHR) enables the search of thousands or millions of patients to rapidly find similar cases and compare outcomes. We will develop methods for feature engineering, electronic phenotyping and patient matching from real-world EHR data. If we are successful, we will significantly advance our ability to generate insights from the large amounts of health data that are routinely generated as a byproduct of clinical processes.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM011369-06
Application #
9535477
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2013-09-01
Project End
2021-08-31
Budget Start
2018-09-01
Budget End
2019-08-31
Support Year
6
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Stanford University
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304
Callahan, Alison; Winnenburg, Rainer; Shah, Nigam H (2018) U-Index, a dataset and an impact metric for informatics tools and databases. Sci Data 5:180043
Coulet, Adrien; Shah, Nigam H; Wack, Maxime et al. (2018) Predicting the need for a reduced drug dose, at first prescription. Sci Rep 8:15558
Wang, Liwei; Rastegar-Mojarad, Majid; Ji, Zhiliang et al. (2018) Detecting Pharmacovigilance Signals Combining Electronic Medical Records With Spontaneous Reports: A Case Study of Conventional Disease-Modifying Antirheumatic Drugs for Rheumatoid Arthritis. Front Pharmacol 9:875
Ravikumar, K E; Rastegar-Mojarad, Majid; Liu, Hongfang (2017) BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database (Oxford) 2017:
Agarwal, Vibhu; Shah, Nigam H (2017) LEARNING ATTRIBUTES OF DISEASE PROGRESSION FROM TRAJECTORIES OF SPARSE LAB VALUES. Pac Symp Biocomput 22:184-194
Low, Yen Sia; Gallego, Blanca; Shah, Nigam Haresh (2016) Comparing high-dimensional confounder control methods for rapid cohort studies from electronic health records. J Comp Eff Res 5:179-92
Liu, Sijia; Liu, Hongfang; Chaudhary, Vipin et al. (2016) An Infinite Mixture Model for Coreference Resolution in Clinical Notes. AMIA Jt Summits Transl Sci Proc 2016:428-37
Nead, Kevin T; Gaskin, Greg; Chester, Cariad et al. (2016) Influence of age on androgen deprivation therapy-associated Alzheimer's disease. Sci Rep 6:35695
Winnenburg, Rainer; Shah, Nigam H (2016) Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature. BMC Bioinformatics 17:250
Gaskin, Gregory L; Pershing, Suzann; Cole, Tyler S et al. (2016) Predictive modeling of risk factors and complications of cataract surgery. Eur J Ophthalmol 26:328-37

Showing the most recent 10 out of 36 publications