For nearly two decades this grant has developed statistical and computational tools vital to gene mapping. During that period, technology and genomic data changed dramatically. Expression and genotyping chips became standard scientific tools; the full genomes from a host of organisms, including the human species, were sequenced; and low-cost sequencing transitioned from fantasy to reality. The last decade has also witnessed a shift from common to rare SNVs (single nucleotide variants) and from moderate-sized studies to large consortium studies. Simultaneously, computers have grown exponentially in speed and memory. These parallel advances have powered thousands of successful human gene mapping studies for both Mendelian and complex traits. Because these successes have shed light on only a fraction of the heritability of common traits, we have not yet reached the endgame of statistical genetics. There is still need for new ideas and better software. We plan to build on our previous successes, with particular stress on adapting modern methods of data mining to genetic applications. We and others have made great strides in applying penalized estimation and model selection in genomics. Genetic analysis via penalized regression easily handles non-genetic predictors, uncertainty in genotype and sequence calls, corrections for ethnic admixture, quantitative traits and disease dichotomies, gene-gene and gene-environment interactions, and both rare and common variants. Unfortunately, it is now apparent that penalized estimation is hampered by severe shrinkage and inflated false positive rates. Our recent development of the proximal distance algorithms and AIC (Akaike information criterion) guided regression show that severe shrinkage can be eliminated and false positive rates tamed. We are also convinced that haplotypes have been underexploited in genetic analysis. These flag local gene sharing, serve as surrogates for rare variants, capture intragenic interactions, and enable both fixed and random effects QTL (quantitative trait locus) mapping. Our extensive list of aims should not be interpreted as a lack of focus. Our track record shows that we can make progress on a number of fronts simultaneously. All of our efforts are directed toward sharpening the tools of genetic analysis. As our programs SIMWALK, MENDEL, and ADMIXTURE illustrate, we are committed to translating theoretical advances into user-friendly software. These programs are notable for their comprehensiveness, speed, reliability, small memory usage, and detailed documentation. The goal of this grant is to empower the very large genetic studies on the horizon. Collectively, our Specific Aims go a long way towards that goal.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, the scientific community's ability to collect genetic information has now far outstripped its ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers' ability to map and characterize genes that lead to human diseases and trait variation.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
2R01GM053275-22
Application #
8962326
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
1995-08-01
Project End
2020-03-31
Budget Start
2016-06-02
Budget End
2017-03-31
Support Year
22
Fiscal Year
2016
Total Cost
Indirect Cost
Name
University of California Los Angeles
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Lake, James A; Larsen, Joseph; Tran, Dan Thy et al. (2018) Uncovering the Genomic Origins of Life. Genome Biol Evol 10:1705-1714
vonHoldt, Bridgett M; Ji, Sarah S; Aardema, Matthew L et al. (2018) Activity of Genes with Functions in Human Williams-Beuren Syndrome Is Impacted by Mobile Element Insertions in the Gray Wolf Genome. Genome Biol Evol 10:1546-1553
Paul, Kimberly C; Sinsheimer, Janet S; Cockburn, Myles et al. (2018) NFE2L2, PPARGC1?, and pesticides and Parkinson's disease risk and progression. Mech Ageing Dev 173:1-8
Lin, Liang-Yu; Chun Chang, Sunny; O'Hearn, Jim et al. (2018) Systems Genetics Approach to Biomarker Discovery: GPNMB and Heart Failure in Mice and Humans. G3 (Bethesda) 8:3499-3506
Gilbert, Princess S; Wu, Jing; Simon, Margaret W et al. (2018) Filtering nucleotide sites by phylogenetic signal to noise ratio increases confidence in the Neoaves phylogeny generated from ultraconserved elements. Mol Phylogenet Evol 126:116-128
Zhang, Yiwen; Zhou, Hua; Zhou, Jin et al. (2017) Regression Models For Multivariate Count Data. J Comput Graph Stat 26:1-13
Paul, Kimberly C; Sinsheimer, Janet S; Cockburn, Myles et al. (2017) Organophosphate pesticides and PON1 L55M in Parkinson's disease progression. Environ Int 107:75-81
Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé et al. (2017) Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. Am J Hum Genet 100:473-487
Zhou, Hua; Blangero, John; Dyer, Thomas D et al. (2017) Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data. Genet Epidemiol 41:174-186
Kichaev, Gleb; Roytman, Megan; Johnson, Ruth et al. (2017) Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33:248-255

Showing the most recent 10 out of 156 publications