The promise of precision medicine is to edit a patient?s DNA and/or administer therapeutics targeting etiologic molecules that prevent or reverse the disease process using a tailored design. All of this happens at the level of the individual and requires precision knowledge of that patient?s biology. In stark contrast, much of the knowledge we possess about genomic risk factors comes from statistical measures of association from human populations. The conceptual and practical disconnect between the populations we study and the individuals we want to treat is a major source of confusion about how to move forward in an era driven by genome technology. The primary goal of this proposal is to develop novel informatics methodology and software to facilitate precision medicine by connecting population and individual genomic phenomena. We propose here a Virtual Genomic Medicine (VGMed) workbench where clinicians can carry out thought experiments about the treatment of individual patients using models of disease risk derived from population-level studies. This will be accomplished by first developing a novel Genomics-guided Automated Machine Learning (GAML) algorithm for deriving risk models from real data that is accessible to clinicians (AIM 1). We will then develop a novel simulation approach that is able to generate artificial data that preserves the distribution of genetic effects observed in the real data while maintaining other characteristics such as genotype frequencies (AIM 2). This will generate open data allowing anyone to perform virtual interventions on patients derived from a population- level risk distribution. The workbench will allow editing of individual genotypes and simulate the administration of drugs by editing machine learning parameters in the simulation model (AIM 3). The change in risk and disease status for the specific patient will be tracked in real time. Finally, we provide a feature in the workbench that will allow the clinician to generate specific hypotheses about individual genetic variants that can then be validated using integrated knowledge sources that include databases such as PubMed and ClinVar thus giving the user immediate feedback (AIM 4). All methods and software will be provided as open-source (AIM 5).

Public Health Relevance

Most genetic studies of common human diseases result in statistical summaries of risk derived from human populations. These statistical summaries are not that helpful for determining the health of an individual. This proposal will create new computer algorithms and software help clinicians and researchers connect population- level statistics with individual level genetic effects to advance our understanding of how to treat patients based on their own unique genetic makeup.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
2R01LM010098-10
Application #
9661406
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-09-30
Project End
2024-02-28
Budget Start
2019-03-05
Budget End
2020-02-29
Support Year
10
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Urbanowicz, Ryan J; Olson, Randal S; Schmitt, Peter et al. (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168-188
Manduchi, Elisabetta; Williams, Scott M; Chesi, Alessandra et al. (2018) Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS. Hum Genet 137:413-425
Urbanowicz, Ryan J; Meeker, Melissa; La Cava, William et al. (2018) Relief-based feature selection: Introduction and review. J Biomed Inform 85:189-203
Chernikova, Diana A; Madan, Juliette C; Housman, Molly L et al. (2018) The premature infant gut microbiome during the first 6 weeks of life differs based on gestational maturity at birth. Pediatr Res 84:71-79
Piette, Elizabeth R; Moore, Jason H (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11:6
Tragante, Vinicius; Hemerich, Daiane; Alshabeeb, Mohammad et al. (2018) Druggability of Coronary Artery Disease Risk Loci. Circ Genom Precis Med 11:e001977
Teumer, Alexander; Gambaro, Giovanni; Corre, Tanguy et al. (2018) Negative effect of vitamin D on kidney function: a Mendelian randomization study. Nephrol Dial Transplant 33:2139-2145
Beaulieu-Jones, Brett K; Lavage, Daniel R; Snyder, John W et al. (2018) Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis. JMIR Med Inform 6:e11
Manduchi, Elisabetta; Chesi, Alessandra; Hall, Molly A et al. (2018) Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS. Pac Symp Biocomput 23:548-558
Vajravelu, Ravy K; Scott, Frank I; Mamtani, Ronac et al. (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25:780-789

Showing the most recent 10 out of 157 publications