Statistical Methods for Incorporating Machine Learning Tools in Inference and Large-Scale Surveillance using Electronic Medical Records Data

Carone, Marco

Abstract

The modernization and standardization of clinical care information systems is creating large networks of linked electronic health records (EHR) that capture key treatments and select patient outcomes for millions of patients throughout the country. The observational data emerging from these systems provide an unparalleled opportunity to learn about the effectiveness of existing and novel treatments, and to monitor potential safety issues that may arise when interventions are used in broad patient populations. However, observational clinical data have exposures that are driven by many factors and therefore aggressive adjustment is needed to remove as much confounding bias as possible in order to make attribution regarding select exposures. The field of machine learning provides a powerful collection of data-driven approaches for performing flexible, thorough confounding adjustment, but performing reliable statistical inference is particularly challenging when these techniques are used as part of the analytic strategy. We propose to advance reproducible research methods by developing and illustrating novel targeted learning tools that leverage the flexibility of machine learning methods to detect and characterize health effect signals using large-scale EHR data. Specifically, we will first develop techniques for making efficient, statistically valid and robust inference for treatment effects using state-of-the-art machine learning tools. We will also develop online learning techniques to make such inference in the context of streaming EHR data. Methodological advances will enable us to formulate a formal, rigorous and practical framework for conducting continuous, effective and reliable surveillance for safety endpoints. Finally, we will develop statistical approaches for incorporating prior information -- including demographic, epidemiologic or pharmacodynamic knowledge, for example -- to improve health effect estimation and inference when the health outcome of interest is rare and the statistical problem is thus difficult, as often occurs in safety surveillance. The ultimate goal of the proposed research is to enable biomedical researchers and public health regulators to carefully monitor and protect the health of the public by allowing them to more effectively and more reliably detect critical health effect signals that may be contained in population-scale EHR data.

Public Health Relevance

The modernization and standardization of clinical care information systems is creating large networks of linked electronic medical records that capture key treatments and select patient outcomes for millions of U.S. subjects. The population scale of contemporary health care data is opening new opportunities for quickly learning from observational data, and is now supporting on-going national surveillance that will monitor the risks and benefits of both existing and novel treatment paths. The objective of this proposal is to provide an inferential framework that leverages the flexibility of machine learning methods to detect health effect signals, including in the important setting of high-dimensional confounders and/or rare events, and to develop a real-time sequential updating methodology for safety signal detection.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Heart, Lung, and Blood Institute (NHLBI)
Type: Research Project (R01)
Project #: 5R01HL137808-02
Application #: 9979940
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Roper, Rebecca

Project Start: 2019-07-18
Project End: 2024-06-30
Budget Start: 2020-07-01
Budget End: 2021-06-30
Support Year: 2
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Washington
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2020 R01 HL	Statistical Methods for Incorporating Machine Learning Tools in Inference and Large-Scale Surveillance using Electronic Medical Records Data Carone, Marco / University of Washington
NIH 2019 R01 HL	Statistical Methods for Incorporating Machine Learning Tools in Inference and Large-Scale Surveillance using Electronic Medical Records Data Carone, Marco / University of Washington

Comments

Be the first to comment on Marco Carone's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: