Optimizing the Utility of Electronic Medical Records Data in Data-driven Health Research ABSTRACT Medical centers continue to archive patient follow-up data in Electronic Medical Records (EMR), which have tremendous value in discovering new knowledge and insights. The large volume of EMR data can play an important role in improving the accuracy and generalizability of predictive models in healthcare, especially when misdiagnosis is known to be the third leading cause of death in the United States. Despite these merits, EMR data are invariably corrupted by factors like missing values, outliers, and unrealistic measurements, which prevent researchers from fully utilizing such abundant data in many important studies. Many studies simply discard a large number of samples to get rid of missingness and eventually bias their data-driven analytical models. Existing techniques for missing data imputation use simplified linear models and are mostly suitable for imputing cross-sectional data missingness that ignore longitudinal missingness in patient follow-up data. This proposal aims to investigate novel artificial intelligence (AI) based models to improve the quality and utility of EMR data in preparation for data-driven retrospective studies. Toward this preparation, the goal of the project is 1) to investigate more accurate and robust data imputation models compared to existing ones and 2) adapt state-of-the-art deep learning techniques in preparing optimal representation of large EMR data. The proposed research will 1) maximize the quality and utility of EMR data to support a multitude of retrospective studies, 2) enable visualization of complex patient data, 3) identify more important and predictive clinical parameters, 4) yield a compact and optimal representation of large EMR datasets. We hypothesize that optimally processed EMR data with state-of-the-art AI models can most accurately model patient risk when compared to existing statistical and clinical risk models. This project will combine the complementary expertise of the collaborators, Dr. Manar Samad, PhD (Computer Science), Dr. Owen Johnson, DPH (Biostatistics and Public Health), and Dr. Edilberto Raynes, MD, PhD (Medicine) along with the participating undergraduate students at Tennessee State University (TSU). The proposal entails several research and development components that will allow undergraduate students to gain valuable research and analytical skills in data science, programming, and health informatics. The project activities will expose health science students to AI-based computing solutions to broaden their scope of future health research and career. This project will help TSU prepare a strong workforce of minority students who will gain competitive skill sets in data science and health informatics that are currently high in demand almost everywhere. Overall, the project will develop a data-capable workforce to strengthen an interdisciplinary research capacity and collaboration between the Departments of Computer and Health science at TSU.

Public Health Relevance

Electronic Medical Record (EMR) data are invariably corrupted by data missingness and data redundancy that limit their application in many valuable data-driven research studies aiming to achieve the goals of precision medicine. This project aims to develop a number of innovative computational frameworks that will optimally prepare and utilize EMR data to facilitate research studies with relevance to patient risk modeling, discovery of new health markers, and patient-specific prognosis and therapeutic strategy. The project will leverage recent advances in machine learning and data science by involving an interdisciplinary team of researchers consisting of three faculty members, two graduate students, and four undergraduate students, which will establish a data-centric research collaboration between the Departments of Computer and Health Science.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15LM013569-01
Application #
10111205
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Ye, Jane
Project Start
2020-09-18
Project End
2023-08-31
Budget Start
2020-09-18
Budget End
2023-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Tennessee State University
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
108814179
City
Nashville
State
TN
Country
United States
Zip Code
37209