For seventeen years this grant has developed statistical and computational tools vital to gene mapping. During that period, technology and genomic data have changed dramatically. Expression and genotyping chips have become standard scientific tools, and the full genomes from a host of organisms, including the human species, have been sequenced. Computers have continued to grow exponentially in speed and memory. These parallel advances have powered hundreds of successful human gene mapping studies for both Mendelian and complex traits. Unfortunately, these successes have shed light on only a small fraction of the genetic heritability of complex traits. This is hardly surprising as current technology stresses common SNPs, and selection tends to drive common deleterious mutations to extinction. There are several candidates for the missing dark matter of genetic epidemiology. Among these are (a) copy number variants, (b) polygenes of small effect, (c) missed interactions among genes and between genes and environment, (d) epigenetic effects, (e) variation across populations, (f) rare variants, and (g) non-coding RNA. As sequencing costs continue to decline rapidly, the search for rare variants via large-scale sequencing is perhaps the most promising new route to disease gene discovery. In the next cycle of this grant, we plan to build on our successes, with particular stress on mining the growing avalanche of sequence data. The statistical analysis of sequence data is surely one of the most complex undertakings in all of modern biology. Currently, the data being generated are in danger of being squandered due to a lack of good analysis tools. Beyond raw sequence data, interesting connections are being forged by the bioinformatic and functional genomic communities. We desperately need to bring this accumulated knowledge in mutation severity prediction and gene interactions to bear on gene mapping. In our opinion, the extraordinarily fast, coordinate descent forms of penalized regression are the best candidate tools for successful analysis of high-dimensional sequencing data. Genetic analysis via penalized regression easily handles non-genetic predictors, uncertainty in genotype and sequence calls, corrections for ethnic admixture, quantitative traits and disease dichotomies, gene-gene and gene-environment interactions, and both rare and common variants.
Our first aim i s to extend our penalized regression algorithms to incorporate prior biological knowledge at the variant level; distinguish modes of inheritance at the gene level; capture multivariate phenotypes; and exploit network information in interaction testing. Additional aims of this proposal include new methods to use sequence data to rule out variants involvement in Mendelian traits; extensions to our tests for intergenerational effects; and more efficient algorithms for genome-wide association tests based on pedigree data. Finally, we will implement all these innovations in our mature, freely distributed, statistical genetics package MENDEL.
The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers' ability to map and characterize genes that lead to human diseases and to trait variation.
|Lake, James A; Larsen, Joseph; Tran, Dan Thy et al. (2018) Uncovering the Genomic Origins of Life. Genome Biol Evol 10:1705-1714|
|vonHoldt, Bridgett M; Ji, Sarah S; Aardema, Matthew L et al. (2018) Activity of Genes with Functions in Human Williams-Beuren Syndrome Is Impacted by Mobile Element Insertions in the Gray Wolf Genome. Genome Biol Evol 10:1546-1553|
|Paul, Kimberly C; Sinsheimer, Janet S; Cockburn, Myles et al. (2018) NFE2L2, PPARGC1?, and pesticides and Parkinson's disease risk and progression. Mech Ageing Dev 173:1-8|
|Lin, Liang-Yu; Chun Chang, Sunny; O'Hearn, Jim et al. (2018) Systems Genetics Approach to Biomarker Discovery: GPNMB and Heart Failure in Mice and Humans. G3 (Bethesda) 8:3499-3506|
|Gilbert, Princess S; Wu, Jing; Simon, Margaret W et al. (2018) Filtering nucleotide sites by phylogenetic signal to noise ratio increases confidence in the Neoaves phylogeny generated from ultraconserved elements. Mol Phylogenet Evol 126:116-128|
|Thompson, Michael J; vonHoldt, Bridgett; Horvath, Steve et al. (2017) An epigenetic aging clock for dogs and wolves. Aging (Albany NY) 9:1055-1068|
|Shi, Huwenbo; Mancuso, Nicholas; Spendlove, Sarah et al. (2017) Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits. Am J Hum Genet 101:737-751|
|Keys, Kevin L; Chen, Gary K; Lange, Kenneth (2017) Iterative hard thresholding for model selection in genome-wide association studies. Genet Epidemiol 41:756-768|
|Crandall, Carolyn J; Manson, JoAnn E; Hohensee, Chancellor et al. (2017) Association of genetic variation in the tachykinin receptor 3 locus with hot flashes and night sweats in the Women's Health Initiative Study. Menopause 24:252-261|
|Zhang, Yiwen; Zhou, Hua; Zhou, Jin et al. (2017) Regression Models For Multivariate Count Data. J Comput Graph Stat 26:1-13|
Showing the most recent 10 out of 156 publications