Bioinformatics Strategies for Genome Wide Association Studies

Moore, Jason

Abstract

The promise of precision medicine is to edit a patient?s DNA and/or administer therapeutics targeting etiologic molecules that prevent or reverse the disease process using a tailored design. All of this happens at the level of the individual and requires precision knowledge of that patient?s biology. In stark contrast, much of the knowledge we possess about genomic risk factors comes from statistical measures of association from human populations. The conceptual and practical disconnect between the populations we study and the individuals we want to treat is a major source of confusion about how to move forward in an era driven by genome technology. The primary goal of this proposal is to develop novel informatics methodology and software to facilitate precision medicine by connecting population and individual genomic phenomena. We propose here a Virtual Genomic Medicine (VGMed) workbench where clinicians can carry out thought experiments about the treatment of individual patients using models of disease risk derived from population-level studies. This will be accomplished by first developing a novel Genomics-guided Automated Machine Learning (GAML) algorithm for deriving risk models from real data that is accessible to clinicians (AIM 1). We will then develop a novel simulation approach that is able to generate artificial data that preserves the distribution of genetic effects observed in the real data while maintaining other characteristics such as genotype frequencies (AIM 2). This will generate open data allowing anyone to perform virtual interventions on patients derived from a population- level risk distribution. The workbench will allow editing of individual genotypes and simulate the administration of drugs by editing machine learning parameters in the simulation model (AIM 3). The change in risk and disease status for the specific patient will be tracked in real time. Finally, we provide a feature in the workbench that will allow the clinician to generate specific hypotheses about individual genetic variants that can then be validated using integrated knowledge sources that include databases such as PubMed and ClinVar thus giving the user immediate feedback (AIM 4). All methods and software will be provided as open-source (AIM 5).

Public Health Relevance

Most genetic studies of common human diseases result in statistical summaries of risk derived from human populations. These statistical summaries are not that helpful for determining the health of an individual. This proposal will create new computer algorithms and software help clinicians and researchers connect population- level statistics with individual level genetic effects to advance our understanding of how to treat patients based on their own unique genetic makeup.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 2R01LM010098-10
Application #: 9661406
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 2009-09-30
Project End: 2024-02-28
Budget Start: 2019-03-05
Budget End: 2020-02-29
Support Year: 10
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: University of Pennsylvania
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects

Publications

Teumer, Alexander; Gambaro, Giovanni; Corre, Tanguy et al. (2018) Negative effect of vitamin D on kidney function: a Mendelian randomization study. Nephrol Dial Transplant 33:2139-2145

Beaulieu-Jones, Brett K; Lavage, Daniel R; Snyder, John W et al. (2018) Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis. JMIR Med Inform 6:e11

Manduchi, Elisabetta; Chesi, Alessandra; Hall, Molly A et al. (2018) Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS. Pac Symp Biocomput 23:548-558

Vajravelu, Ravy K; Scott, Frank I; Mamtani, Ronac et al. (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25:780-789

Piette, Elizabeth R; Moore, Jason H (2018) Identification of epistatic interactions between the human RNA demethylases FTO and ALKBH5 with gene set enrichment analysis informed by differential methylation. BMC Proc 12:59

Urbanowicz, Ryan J; Olson, Randal S; Schmitt, Peter et al. (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168-188

Manduchi, Elisabetta; Williams, Scott M; Chesi, Alessandra et al. (2018) Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS. Hum Genet 137:413-425

Urbanowicz, Ryan J; Meeker, Melissa; La Cava, William et al. (2018) Relief-based feature selection: Introduction and review. J Biomed Inform 85:189-203

Chernikova, Diana A; Madan, Juliette C; Housman, Molly L et al. (2018) The premature infant gut microbiome during the first 6 weeks of life differs based on gestational maturity at birth. Pediatr Res 84:71-79

Piette, Elizabeth R; Moore, Jason H (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11:6

Showing the most recent 10 out of 157 publications

Comments

Be the first to comment on Jason Moore's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: