Most medical decisions are made without the support of rigorous evidence in large part due to the cost and complexity of performing randomized trials for most clinical situations. In practice, clinicians must use their judgement, informed by their own and the collective experience of their colleagues. The advent of the electronic health record (EHR) enables the modern practitioner to algorithmically check the records of thousands or millions of patients to rapidly find similar cases and compare outcomes. In addition to filling the inferential gap in actionable evidence, these kinds of analyses avoid issues of ethics, practicality, and generalizability that plague randomized clinical trials (RCTs). Unfortunately, identifying patients with the appropriate phenotypes, properly leveraging available data to adjust results, and matching similar patients to reduce confounding remain critical challenges in every study that uses EHR data. Overcoming these challenges to improve the accuracy of observational studies conducted with EHR data is of paramount importance. Studies using EHR data begin by defining a set of patients with specific phenotypes, analogous to amassing a cohort for a clinical trial. This process of electronic phenotyping, is typically done via a set of rules defined by experts. Machine learning approaches are increasingly used to complement consensus definitions created by experts and we propose several advances to validate and improve this practice. We will explore and quantify the effects of feature engineering choices to transform the diagnoses, procedures, medications, laboratory tests and clinical notes in the EHR into a computable feature matrix. Finally, building on recent advances, we plan to characterize the performance of existing methods and develop EHR-specific strategies for patient matching. Our work is significant because we will take on three challenging problems--electronic phenotyping, feature engineering, and patient matching--that stand in the way of generating insights via EHR data. If we are successful, we will significantly advance our ability to generate insights from the large amounts of health data that are routinely generated as a byproduct of clinical processes.
The advent of the electronic health record (EHR) enables the search of thousands or millions of patients to rapidly find similar cases and compare outcomes. We will develop methods for feature engineering, electronic phenotyping and patient matching from real-world EHR data. If we are successful, we will significantly advance our ability to generate insights from the large amounts of health data that are routinely generated as a byproduct of clinical processes.
|Callahan, Alison; Winnenburg, Rainer; Shah, Nigam H (2018) U-Index, a dataset and an impact metric for informatics tools and databases. Sci Data 5:180043|
|Coulet, Adrien; Shah, Nigam H; Wack, Maxime et al. (2018) Predicting the need for a reduced drug dose, at first prescription. Sci Rep 8:15558|
|Wang, Liwei; Rastegar-Mojarad, Majid; Ji, Zhiliang et al. (2018) Detecting Pharmacovigilance Signals Combining Electronic Medical Records With Spontaneous Reports: A Case Study of Conventional Disease-Modifying Antirheumatic Drugs for Rheumatoid Arthritis. Front Pharmacol 9:875|
|Ravikumar, K E; Rastegar-Mojarad, Majid; Liu, Hongfang (2017) BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database (Oxford) 2017:|
|Agarwal, Vibhu; Shah, Nigam H (2017) LEARNING ATTRIBUTES OF DISEASE PROGRESSION FROM TRAJECTORIES OF SPARSE LAB VALUES. Pac Symp Biocomput 22:184-194|
|Yu, Yue; Chen, Jun; Li, Dingcheng et al. (2016) Systematic Analysis of Adverse Event Reports for Sex Differences in Adverse Drug Events. Sci Rep 6:24955|
|Nead, Kevin T; Gaskin, Greg; Chester, Cariad et al. (2016) Androgen Deprivation Therapy and Future Alzheimer's Disease Risk. J Clin Oncol 34:566-71|
|Banda, Juan M; Callahan, Alison; Winnenburg, Rainer et al. (2016) Feasibility of Prioritizing Drug-Drug-Event Associations Found in Electronic Health Records. Drug Saf 39:45-57|
|Poole, Sarah; Schroeder, Lee Frederick; Shah, Nigam (2016) An unsupervised learning method to identify reference intervals from a clinical database. J Biomed Inform 59:276-84|
|Oellrich, Anika; Collier, Nigel; Groza, Tudor et al. (2016) The digital revolution in phenotyping. Brief Bioinform 17:819-30|
Showing the most recent 10 out of 36 publications