Genome-wide association studies (GWAS) are commonplace despite the lack of a comprehensive bioinformatics approach to the analysis of the data. The common method of analysis is to employ parametric statistics and then adjust for the large number of tests performed to limit false-positives (i.e. type 1 errors). This agnostic approach is preferred by some because no assumptions are made about which genes or genomic regions might be important. This logic suggests that the data should tell us where the important genetic variants are. The goal of our proposed research program is to specifically compare this agnostic approach with a bioinformatics approach that selects associated SNPs based on expert knowledge about biochemical pathways and gene function. We propose to develop a bioinformatics approach for selecting SNPs from a GWAS using knowledge about the biology of the genes being studied and the molecular pathology of disease (AIM 1). We will modify and extend the Exploratory Visual Analysis (EVA) database and software that was originally designed for microarray studies with pilot funding from the NLM BISTI program. We will then use this bioinformatics approach along with an agnostic statistical approach for detecting SNPs associated with plasma levels of tissue plasminogen activator (t-PA) and plasminogen activator inhibitor one (PAI-1) in a large population-based sample of Caucasians (n=2000) from the PREVEND study in Groningen, The Netherlands (AIM 2). Those SNPs identified by both methods in the PREVEND study will be evaluated first for replication in an independent population-based sample of Caucasians (n=2000) from the Rotterdam Study in the Netherlands and then for validation in a population-based sample of Blacks (n=2000) from the HeART Study in Ghana, Africa (AIM 3). Finally, we will specifically compare how many and which SNPs replicate and validate using the statistical approach and the bioinformatics approach (AIM 4). Our working hypothesis is that we will obtain more validated and hence more real SNPs using the bioinformatics approach.

Public Health Relevance

The technology to measure information about the human genome is advancing at a rapid pace. Despite these advance, the computational methods for analyzing the data have not kept pace. We will develop new computer algorithms and software that can be used to identify genetic biomarkers of common human diseases and then compare this approach with an analysis strategy that is based only on statistical methods.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM010098-03
Application #
8143552
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-09-30
Project End
2013-09-29
Budget Start
2011-09-30
Budget End
2012-09-29
Support Year
3
Fiscal Year
2011
Total Cost
$315,706
Indirect Cost
Name
Dartmouth College
Department
Genetics
Type
Schools of Medicine
DUNS #
041027822
City
Hanover
State
NH
Country
United States
Zip Code
03755
Manduchi, Elisabetta; Chesi, Alessandra; Hall, Molly A et al. (2018) Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS. Pac Symp Biocomput 23:548-558
Vajravelu, Ravy K; Scott, Frank I; Mamtani, Ronac et al. (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25:780-789
Piette, Elizabeth R; Moore, Jason H (2018) Identification of epistatic interactions between the human RNA demethylases FTO and ALKBH5 with gene set enrichment analysis informed by differential methylation. BMC Proc 12:59
Urbanowicz, Ryan J; Olson, Randal S; Schmitt, Peter et al. (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168-188
Manduchi, Elisabetta; Williams, Scott M; Chesi, Alessandra et al. (2018) Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS. Hum Genet 137:413-425
Urbanowicz, Ryan J; Meeker, Melissa; La Cava, William et al. (2018) Relief-based feature selection: Introduction and review. J Biomed Inform 85:189-203
Chernikova, Diana A; Madan, Juliette C; Housman, Molly L et al. (2018) The premature infant gut microbiome during the first 6 weeks of life differs based on gestational maturity at birth. Pediatr Res 84:71-79
Piette, Elizabeth R; Moore, Jason H (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11:6
Tragante, Vinicius; Hemerich, Daiane; Alshabeeb, Mohammad et al. (2018) Druggability of Coronary Artery Disease Risk Loci. Circ Genom Precis Med 11:e001977
Teumer, Alexander; Gambaro, Giovanni; Corre, Tanguy et al. (2018) Negative effect of vitamin D on kidney function: a Mendelian randomization study. Nephrol Dial Transplant 33:2139-2145

Showing the most recent 10 out of 157 publications