For seventeen years this grant has developed statistical and computational tools vital to gene mapping. During that period, technology and genomic data have changed dramatically. Expression and genotyping chips have become standard scientific tools, and the full genomes from a host of organisms, including the human species, have been sequenced. Computers have continued to grow exponentially in speed and memory. These parallel advances have powered hundreds of successful human gene mapping studies for both Mendelian and complex traits. Unfortunately, these successes have shed light on only a small fraction of the genetic heritability of complex traits. This is hardly surprising as current technology stresses common SNPs, and selection tends to drive common deleterious mutations to extinction. There are several candidates for the missing dark matter of genetic epidemiology. Among these are (a) copy number variants, (b) polygenes of small effect, (c) missed interactions among genes and between genes and environment, (d) epigenetic effects, (e) variation across populations, (f) rare variants, and (g) non-coding RNA. As sequencing costs continue to decline rapidly, the search for rare variants via large-scale sequencing is perhaps the most promising new route to disease gene discovery. In the next cycle of this grant, we plan to build on our successes, with particular stress on mining the growing avalanche of sequence data. The statistical analysis of sequence data is surely one of the most complex undertakings in all of modern biology. Currently, the data being generated are in danger of being squandered due to a lack of good analysis tools. Beyond raw sequence data, interesting connections are being forged by the bioinformatic and functional genomic communities. We desperately need to bring this accumulated knowledge in mutation severity prediction and gene interactions to bear on gene mapping. In our opinion, the extraordinarily fast, coordinate descent forms of penalized regression are the best candidate tools for successful analysis of high-dimensional sequencing data. Genetic analysis via penalized regression easily handles non-genetic predictors, uncertainty in genotype and sequence calls, corrections for ethnic admixture, quantitative traits and disease dichotomies, gene-gene and gene-environment interactions, and both rare and common variants.
Our first aim i s to extend our penalized regression algorithms to incorporate prior biological knowledge at the variant level; distinguish modes of inheritance at the gene level; capture multivariate phenotypes; and exploit network information in interaction testing. Additional aims of this proposal include new methods to use sequence data to rule out variants involvement in Mendelian traits; extensions to our tests for intergenerational effects; and more efficient algorithms for genome-wide association tests based on pedigree data. Finally, we will implement all these innovations in our mature, freely distributed, statistical genetics package MENDEL.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers' ability to map and characterize genes that lead to human diseases and to trait variation.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM053275-21
Application #
8827784
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
1995-08-01
Project End
2016-03-31
Budget Start
2015-04-01
Budget End
2016-03-31
Support Year
21
Fiscal Year
2015
Total Cost
Indirect Cost
Name
University of California Los Angeles
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Lake, James A; Larsen, Joseph; Tran, Dan Thy et al. (2018) Uncovering the Genomic Origins of Life. Genome Biol Evol 10:1705-1714
vonHoldt, Bridgett M; Ji, Sarah S; Aardema, Matthew L et al. (2018) Activity of Genes with Functions in Human Williams-Beuren Syndrome Is Impacted by Mobile Element Insertions in the Gray Wolf Genome. Genome Biol Evol 10:1546-1553
Paul, Kimberly C; Sinsheimer, Janet S; Cockburn, Myles et al. (2018) NFE2L2, PPARGC1?, and pesticides and Parkinson's disease risk and progression. Mech Ageing Dev 173:1-8
Lin, Liang-Yu; Chun Chang, Sunny; O'Hearn, Jim et al. (2018) Systems Genetics Approach to Biomarker Discovery: GPNMB and Heart Failure in Mice and Humans. G3 (Bethesda) 8:3499-3506
Gilbert, Princess S; Wu, Jing; Simon, Margaret W et al. (2018) Filtering nucleotide sites by phylogenetic signal to noise ratio increases confidence in the Neoaves phylogeny generated from ultraconserved elements. Mol Phylogenet Evol 126:116-128
Shi, Huwenbo; Mancuso, Nicholas; Spendlove, Sarah et al. (2017) Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits. Am J Hum Genet 101:737-751
Keys, Kevin L; Chen, Gary K; Lange, Kenneth (2017) Iterative hard thresholding for model selection in genome-wide association studies. Genet Epidemiol 41:756-768
Crandall, Carolyn J; Manson, JoAnn E; Hohensee, Chancellor et al. (2017) Association of genetic variation in the tachykinin receptor 3 locus with hot flashes and night sweats in the Women's Health Initiative Study. Menopause 24:252-261
Zhang, Yiwen; Zhou, Hua; Zhou, Jin et al. (2017) Regression Models For Multivariate Count Data. J Comput Graph Stat 26:1-13
Paul, Kimberly C; Sinsheimer, Janet S; Cockburn, Myles et al. (2017) Organophosphate pesticides and PON1 L55M in Parkinson's disease progression. Environ Int 107:75-81

Showing the most recent 10 out of 156 publications