Rapid progress in translational bioinformatics and clinical informatics for precision medicine has provided many computing and informatics methodologies to provide better prediction, diagnosis and treatment strategy as a clinical utility. In particular, high dimensional and large-scale biomedical data sets, ranging from clinical data to ?omics data, provide an unprecedented opportunity for translating the newly found knowledge from biomedical big data analytics to support clinical decisions. The complexity and scale of these big data sets hold great promise, yet present substantial challenges. As one of important concerns for clinicians, comorbidity is a well- documented phenomenon in medicine in which one or more medical conditions exist and potentially interact with one another, thereby influencing the primary clinical condition. Several studies show variability in the number of comorbid conditions that can exist at one time, and patterns of disease presentation differ from one chronic condition to another. Thus, there is a clear need to improve care for individuals with multiple comorbidities, but doing so requires a much more detailed understanding of the trends of disease associations than we currently possess. Previous studies have primarily focused on a handful of specific comorbidities; investigating the underlying causes of broad disease comorbidity across the human diseasome has been challenging. Fortunately, in the past decade, comprehensive collections of disease diagnosis data have become available, primarily in the form of data from electronic health records (EHRs). Retrospectively, we can use a patient?s health history to identify comorbidities and apply a data-driven approach to studying disease comorbidity patterns that considers all possible disease comorbidities. In particular, developing computing and modeling of large-scale data that integrates newly defined comorbidity patterns with genomics will hold great potential for uncovering molecular mechanisms of disease. Primarily, we will elucidate the underlying genetic and non-genetic factors that influence disease comorbidity. We will apply two orthogonal approaches to identify comorbidities: 1) deriving from disease co-occurrence using EHR data alone, and 2) deriving from pleiotropic genetic associations using the EHR-linked biobank dataset. Network-based approaches have the potential to uncover unexpected relationships between diseases. One of the most significant advantages of our proposal is the linking of a single-source EHR to genomic data; this provides the opportunity to revisit individual-level genotype and phenotype data for the design of more targeted studies and to ask more specific questions. Additionally, our results can be used to develop a novel comorbidity risk score that combines both clinical data and genetic effects, which might constitute a new tool for clinical prevention and monitoring. These goals are very much in keeping with today?s climate of precision medicine, where treatment and prevention are ideally designed to consider an individual patient?s variability in genetics, lifestyle, and environmental exposures.

Public Health Relevance

There is a clear need to improve care for individuals with multiple comorbidities, but this requires a much more detailed understanding of the trends of disease associations than we currently possess. An EHR-linked biobank data provides a unique opportunity to investigate cross-phenotypes associations, and it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy. In this proposal, we will construct a disease comorbidity map of 2.1 million patients using longitudinal EHRs in Penn Medicine (Aim 1), construct a disease-gene map derived from phenome-wide association study using Penn Medicine Biobank Participants (Aim 2), and develop a novel scoring system using graph-based machine learning and predict comorbidity risk scores (CRS) for a given disease (Aim 3).

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM138597-01
Application #
10034691
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Ravichandran, Veerasamy
Project Start
2020-08-01
Project End
2024-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104