Genome-wide association studies (GWAS) and next-generation sequencing are now commonplace despite a lack of comprehensive bioinformatics approaches for relating genotype to phenotype. The common method of analysis is to employ parametric statistics and then adjust for the large number of tests performed to limit false-positives. This agnostic approach is preferred by some because no assumptions are made about which genes or genomic regions might be important. The goal of our proposed research program continuation is to develop and evaluate a bioinformatics approach that analyzes genetic associations in the context of expert knowledge about biochemical pathways, gene function and experimental results using gene set enrichment (GSE) methods. An important challenge for success in this domain is the quality of the expert knowledge that is available in public databases such as Gene Ontology (GO). We first propose to develop and evaluate a novel Data-driven Ontology Refinement Algorithm (DORA) for improving the quality of genetic and genomic annotations (AIM 1). Improving the quality of annotations will in turn improve GSE results. We will then develop a comprehensive bioinformatics approach to the analysis of high-throughput genetic association results that considers functional DNA elements, genes, and gene function as important contexts. We will first determine whether considering data from the Encyclopedia of DNA Elements (ENCODE) database improves GSE analysis at the level of gene regions (AIM 2). Next we will determine whether using GO annotations refined by our novel DORA algorithm (DORA-GO) improves GSE analysis at the gene set level above and beyond that provided by GO (AIM 3). We will determine the validity of these methods by assessing the replication of the results in independent data (AIM 4).
AIMS 1 -4 will be accomplished using several large population-based genetic studies of pre-clinical cardiovascular disease (CVD) as measured by left ventricular mass (LVM). Our working hypothesis is that we will obtain more replicated and hence more real genetic associations using our novel bioinformatics methods that embrace, rather than ignore, prior biological knowledge.

Public Health Relevance

The bioinformatics methods and software developed and distributed as part of this research project will play an important role in advancing our ability to fully exploit genome-wide association data for common, complex diseases.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
2R01LM010098-05
Application #
8600812
Study Section
Special Emphasis Panel (ZLM1-ZH-C (01))
Program Officer
Ye, Jane
Project Start
2009-09-30
Project End
2018-09-29
Budget Start
2013-09-30
Budget End
2014-09-29
Support Year
5
Fiscal Year
2013
Total Cost
$321,405
Indirect Cost
$95,371
Name
Dartmouth College
Department
Genetics
Type
Schools of Medicine
DUNS #
041027822
City
Hanover
State
NH
Country
United States
Zip Code
03755
Beaulieu-Jones, Brett K; Moore, Jason H (2017) MISSING DATA IMPUTATION IN THE ELECTRONIC HEALTH RECORD USING DEEPLY LEARNED AUTOENCODERS. Pac Symp Biocomput 22:207-218
Li, Xiaoyin; Redline, Susan; Zhang, Xiang et al. (2017) Height associated variants demonstrate assortative mating in human populations. Sci Rep 7:15689
Kodaman, Nuri; Sobota, Rafal S; Asselbergs, Folkert W et al. (2017) Genetic Effects on the Correlation Structure of CVD Risk Factors: Exome-Wide Data From a Ghanaian Population. Glob Heart 12:133-140
Smits, Nicole C; Kobayashi, Takashi; Srivastava, Pratyaksh K et al. (2017) HS3ST1 genotype regulates antithrombin's inflammomodulatory tone and associates with atherosclerosis. Matrix Biol 63:69-90
Graham, Britney E; Darabos, Christian; Huang, Minjun et al. (2017) Evolutionarily derived networks to inform disease pathways. Genet Epidemiol 41:866-875
Hall, Molly A; Wallace, John; Lucas, Anastasia et al. (2017) PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies. Nat Commun 8:1167
Holzinger, Emily R; Verma, Shefali S; Moore, Carrie B et al. (2017) Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals. BioData Min 10:25
Yao, Xiaohui; Yan, Jingwen; Kim, Sungeun et al. (2017) Two-dimensional enrichment analysis for mining high-level imaging genetic associations. Brain Inform 4:27-37
Justice, Anne E (see original citation for additional authors) (2017) Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits. Nat Commun 8:14977
Ahmed, Musaddeque; Sallari, Richard C; Guo, Haiyang et al. (2017) Variant Set Enrichment: an R package to identify disease-associated functional genomic regions. BioData Min 10:9

Showing the most recent 10 out of 141 publications