Genome-wide association studies (GWAS) and next-generation sequencing are now commonplace despite a lack of comprehensive bioinformatics approaches for relating genotype to phenotype. The common method of analysis is to employ parametric statistics and then adjust for the large number of tests performed to limit false-positives. This agnostic approach is preferred by some because no assumptions are made about which genes or genomic regions might be important. The goal of our proposed research program continuation is to develop and evaluate a bioinformatics approach that analyzes genetic associations in the context of expert knowledge about biochemical pathways, gene function and experimental results using gene set enrichment (GSE) methods. An important challenge for success in this domain is the quality of the expert knowledge that is available in public databases such as Gene Ontology (GO). We first propose to develop and evaluate a novel Data-driven Ontology Refinement Algorithm (DORA) for improving the quality of genetic and genomic annotations (AIM 1). Improving the quality of annotations will in turn improve GSE results. We will then develop a comprehensive bioinformatics approach to the analysis of high-throughput genetic association results that considers functional DNA elements, genes, and gene function as important contexts. We will first determine whether considering data from the Encyclopedia of DNA Elements (ENCODE) database improves GSE analysis at the level of gene regions (AIM 2). Next we will determine whether using GO annotations refined by our novel DORA algorithm (DORA-GO) improves GSE analysis at the gene set level above and beyond that provided by GO (AIM 3). We will determine the validity of these methods by assessing the replication of the results in independent data (AIM 4).
AIMS 1 -4 will be accomplished using several large population-based genetic studies of pre-clinical cardiovascular disease (CVD) as measured by left ventricular mass (LVM). Our working hypothesis is that we will obtain more replicated and hence more real genetic associations using our novel bioinformatics methods that embrace, rather than ignore, prior biological knowledge.

Public Health Relevance

The bioinformatics methods and software developed and distributed as part of this research project will play an important role in advancing our ability to fully exploit genome-wide association data for common, complex diseases.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (01))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dartmouth College
Schools of Medicine
United States
Zip Code
Payne, Joshua L; Moore, Jason H; Wagner, Andreas (2014) Robustness, evolvability, and the logic of genetic regulation. Artif Life 20:111-26
Darabos, Christian; White, Marquitta J; Graham, Britney E et al. (2014) The multiscale backbone of the human phenotype network based on biological pathways. BioData Min 7:1
Davis, Matthew A; Gilbert-Diamond, Diane; Karagas, Margaret R et al. (2014) A dietary-wide association study (DWAS) of environmental metal exposure in US children and adults. PLoS One 9:e104768
Frost, H Robert; Moore, Jason H (2014) Optimization of gene set annotations via entropy minimization over variable clusters (EMVC). Bioinformatics 30:1698-706
Pechenick, Dov A; Payne, Joshua L; Moore, Jason H (2014) Phenotypic robustness and the assortativity signature of human transcription factor networks. PLoS Comput Biol 10:e1003780
Yan, Jingwen; Du, Lei; Kim, Sungeun et al. (2014) Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm. Bioinformatics 30:i564-71
Penrod, Nadia M; Moore, Jason H (2014) Influence networks based on coexpression improve drug target discovery for the development of novel cancer therapeutics. BMC Syst Biol 8:12
Hu, Ting; Banzhaf, Wolfgang; Moore, Jason H (2014) The effects of recombination on phenotypic exploration and robustness in evolution. Artif Life 20:457-70
Gorlov, Ivan P; Moore, Jason H; Peng, Bo et al. (2014) SNP characteristics predict replication success in association studies. Hum Genet 133:1477-86
Leusink, M; Onland-Moret, N C; Asselbergs, F W et al. (2014) Cholesteryl ester transfer protein polymorphisms, statin use, and their impact on cholesterol levels and cardiovascular events. Clin Pharmacol Ther 95:314-20

Showing the most recent 10 out of 57 publications