Bioinformatics Strategies for Genome-Wide Association Studies

Moore, Jason; Asselbergs, Folkert; Williams, Scott

Abstract

Genome-wide association studies (GWAS) are commonplace despite the lack of a comprehensive bioinformatics approach to the analysis of the data. The common method of analysis is to employ parametric statistics and then adjust for the large number of tests performed to limit false-positives (i.e. type 1 errors). This agnostic approach is preferred by some because no assumptions are made about which genes or genomic regions might be important. This logic suggests that the data should tell us where the important genetic variants are. The goal of our proposed research program is to specifically compare this agnostic approach with a bioinformatics approach that selects associated SNPs based on expert knowledge about biochemical pathways and gene function. We propose to develop a bioinformatics approach for selecting SNPs from a GWAS using knowledge about the biology of the genes being studied and the molecular pathology of disease (AIM 1). We will modify and extend the Exploratory Visual Analysis (EVA) database and software that was originally designed for microarray studies with pilot funding from the NLM BISTI program. We will then use this bioinformatics approach along with an agnostic statistical approach for detecting SNPs associated with plasma levels of tissue plasminogen activator (t-PA) and plasminogen activator inhibitor one (PAI-1) in a large population-based sample of Caucasians (n=2000) from the PREVEND study in Groningen, The Netherlands (AIM 2). Those SNPs identified by both methods in the PREVEND study will be evaluated first for replication in an independent population-based sample of Caucasians (n=2000) from the Rotterdam Study in the Netherlands and then for validation in a population-based sample of Blacks (n=2000) from the HeART Study in Ghana, Africa (AIM 3). Finally, we will specifically compare how many and which SNPs replicate and validate using the statistical approach and the bioinformatics approach (AIM 4). Our working hypothesis is that we will obtain more validated and hence more real SNPs using the bioinformatics approach.

Public Health Relevance

The technology to measure information about the human genome is advancing at a rapid pace. Despite these advance, the computational methods for analyzing the data have not kept pace. We will develop new computer algorithms and software that can be used to identify genetic biomarkers of common human diseases and then compare this approach with an analysis strategy that is based only on statistical methods.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM010098-04
Application #: 8332339
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 2009-09-30
Project End: 2013-09-29
Budget Start: 2012-09-30
Budget End: 2013-09-29
Support Year: 4
Fiscal Year: 2012
Total Cost: $312,154
Indirect Cost: $94,945

Institution

Name: Dartmouth College
Department: Genetics
Type: Schools of Medicine
DUNS #: 041027822

City: Hanover
State: NH
Country: United States
Zip Code: 03755

Related projects

Publications

Teumer, Alexander; Gambaro, Giovanni; Corre, Tanguy et al. (2018) Negative effect of vitamin D on kidney function: a Mendelian randomization study. Nephrol Dial Transplant 33:2139-2145

Beaulieu-Jones, Brett K; Lavage, Daniel R; Snyder, John W et al. (2018) Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis. JMIR Med Inform 6:e11

Manduchi, Elisabetta; Chesi, Alessandra; Hall, Molly A et al. (2018) Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS. Pac Symp Biocomput 23:548-558

Vajravelu, Ravy K; Scott, Frank I; Mamtani, Ronac et al. (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25:780-789

Piette, Elizabeth R; Moore, Jason H (2018) Identification of epistatic interactions between the human RNA demethylases FTO and ALKBH5 with gene set enrichment analysis informed by differential methylation. BMC Proc 12:59

Urbanowicz, Ryan J; Olson, Randal S; Schmitt, Peter et al. (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168-188

Manduchi, Elisabetta; Williams, Scott M; Chesi, Alessandra et al. (2018) Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS. Hum Genet 137:413-425

Urbanowicz, Ryan J; Meeker, Melissa; La Cava, William et al. (2018) Relief-based feature selection: Introduction and review. J Biomed Inform 85:189-203

Chernikova, Diana A; Madan, Juliette C; Housman, Molly L et al. (2018) The premature infant gut microbiome during the first 6 weeks of life differs based on gestational maturity at birth. Pediatr Res 84:71-79

Piette, Elizabeth R; Moore, Jason H (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11:6

Showing the most recent 10 out of 157 publications

Comments

Be the first to comment on Jason Moore's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: