Bioinformatics Strategies for Genome-Wide Association Studies

Moore, Jason; Asselbergs, Folkert; Williams, Scott

Abstract

Genome-wide association studies (GWAS) and next-generation sequencing are now commonplace despite a lack of comprehensive bioinformatics approaches for relating genotype to phenotype. The common method of analysis is to employ parametric statistics and then adjust for the large number of tests performed to limit false-positives. This agnostic approach is preferred by some because no assumptions are made about which genes or genomic regions might be important. The goal of our proposed research program continuation is to develop and evaluate a bioinformatics approach that analyzes genetic associations in the context of expert knowledge about biochemical pathways, gene function and experimental results using gene set enrichment (GSE) methods. An important challenge for success in this domain is the quality of the expert knowledge that is available in public databases such as Gene Ontology (GO). We first propose to develop and evaluate a novel Data-driven Ontology Refinement Algorithm (DORA) for improving the quality of genetic and genomic annotations (AIM 1). Improving the quality of annotations will in turn improve GSE results. We will then develop a comprehensive bioinformatics approach to the analysis of high-throughput genetic association results that considers functional DNA elements, genes, and gene function as important contexts. We will first determine whether considering data from the Encyclopedia of DNA Elements (ENCODE) database improves GSE analysis at the level of gene regions (AIM 2). Next we will determine whether using GO annotations refined by our novel DORA algorithm (DORA-GO) improves GSE analysis at the gene set level above and beyond that provided by GO (AIM 3). We will determine the validity of these methods by assessing the replication of the results in independent data (AIM 4).
AIMS 1 -4 will be accomplished using several large population-based genetic studies of pre-clinical cardiovascular disease (CVD) as measured by left ventricular mass (LVM). Our working hypothesis is that we will obtain more replicated and hence more real genetic associations using our novel bioinformatics methods that embrace, rather than ignore, prior biological knowledge.

Public Health Relevance

The bioinformatics methods and software developed and distributed as part of this research project will play an important role in advancing our ability to fully exploit genome-wide association data for common, complex diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM010098-08
Application #: 8913769
Study Section: Special Emphasis Panel (ZLM1-ZH-C (01))
Program Officer: Ye, Jane

Project Start: 2009-09-30
Project End: 2018-09-29
Budget Start: 2016-09-30
Budget End: 2017-09-29
Support Year: 8
Fiscal Year: 2016
Total Cost: $337,041
Indirect Cost: $75,766

Institution

Name: University of Pennsylvania
Department
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects

Publications

Piette, Elizabeth R; Moore, Jason H (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11:6

Tragante, Vinicius; Hemerich, Daiane; Alshabeeb, Mohammad et al. (2018) Druggability of Coronary Artery Disease Risk Loci. Circ Genom Precis Med 11:e001977

Teumer, Alexander; Gambaro, Giovanni; Corre, Tanguy et al. (2018) Negative effect of vitamin D on kidney function: a Mendelian randomization study. Nephrol Dial Transplant 33:2139-2145

Beaulieu-Jones, Brett K; Lavage, Daniel R; Snyder, John W et al. (2018) Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis. JMIR Med Inform 6:e11

Manduchi, Elisabetta; Chesi, Alessandra; Hall, Molly A et al. (2018) Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS. Pac Symp Biocomput 23:548-558

Vajravelu, Ravy K; Scott, Frank I; Mamtani, Ronac et al. (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25:780-789

Piette, Elizabeth R; Moore, Jason H (2018) Identification of epistatic interactions between the human RNA demethylases FTO and ALKBH5 with gene set enrichment analysis informed by differential methylation. BMC Proc 12:59

Urbanowicz, Ryan J; Olson, Randal S; Schmitt, Peter et al. (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168-188

Manduchi, Elisabetta; Williams, Scott M; Chesi, Alessandra et al. (2018) Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS. Hum Genet 137:413-425

Urbanowicz, Ryan J; Meeker, Melissa; La Cava, William et al. (2018) Relief-based feature selection: Introduction and review. J Biomed Inform 85:189-203

Showing the most recent 10 out of 157 publications

Comments

Be the first to comment on Jason Moore's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: