For seventeen years this grant has developed statistical and computational tools vital to gene mapping. During that period, technology and genomic data have changed dramatically. Expression and genotyping chips have become standard scientific tools, and the full genomes from a host of organisms, including the human species, have been sequenced. Computers have continued to grow exponentially in speed and memory. These parallel advances have powered hundreds of successful human gene mapping studies for both Mendelian and complex traits. Unfortunately, these successes have shed light on only a small fraction of the genetic heritability of complex traits. This is hardly surprising as current technology stresses common SNPs, and selection tends to drive common deleterious mutations to extinction. There are several candidates for the missing dark matter of genetic epidemiology. Among these are (a) copy number variants, (b) polygenes of small effect, (c) missed interactions among genes and between genes and environment, (d) epigenetic effects, (e) variation across populations, (f) rare variants, and (g) non-coding RNA. As sequencing costs continue to decline rapidly, the search for rare variants via large-scale sequencing is perhaps the most promising new route to disease gene discovery. In the next cycle of this grant, we plan to build on our successes, with particular stress on mining the growing avalanche of sequence data. The statistical analysis of sequence data is surely one of the most complex undertakings in all of modern biology. Currently, the data being generated are in danger of being squandered due to a lack of good analysis tools. Beyond raw sequence data, interesting connections are being forged by the bioinformatic and functional genomic communities. We desperately need to bring this accumulated knowledge in mutation severity prediction and gene interactions to bear on gene mapping. In our opinion, the extraordinarily fast, coordinate descent forms of penalized regression are the best candidate tools for successful analysis of high-dimensional sequencing data. Genetic analysis via penalized regression easily handles non-genetic predictors, uncertainty in genotype and sequence calls, corrections for ethnic admixture, quantitative traits and disease dichotomies, gene-gene and gene-environment interactions, and both rare and common variants.
Our first aim i s to extend our penalized regression algorithms to incorporate prior biological knowledge at the variant level;distinguish modes of inheritance at the gene level;capture multivariate phenotypes;and exploit network information in interaction testing. Additional aims of this proposal include new methods to use sequence data to rule out variants involvement in Mendelian traits;extensions to our tests for intergenerational effects;and more efficient algorithms for genome-wide association tests based on pedigree data. Finally, we will implement all these innovations in our mature, freely distributed, statistical genetics package MENDEL.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Los Angeles
Biostatistics & Other Math Sci
Schools of Medicine
Los Angeles
United States
Zip Code
Clark, Michelle M; Blangero, John; Dyer, Thomas D et al. (2016) The Quantitative-MFG Test: A Linear Mixed Effect Model to Detect Maternal-Offspring Gene Interactions. Ann Hum Genet 80:63-80
Levine, Andrew J; Soontornniyomkij, Virawudh; Achim, Cristian L et al. (2016) Multilevel analysis of neuropathogenesis of neurocognitive impairment in HIV. J Neurovirol 22:431-41
Janowitz Koch, Ilana; Clark, Michelle M; Thompson, Michael J et al. (2016) The concerted impact of domestication and transposon insertions on methylation patterns between dogs and grey wolves. Mol Ecol 25:1838-55
Zhou, Hua; Blangero, John; Dyer, Thomas D et al. (2016) Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data. Genet Epidemiol :
Zhou, Jin J; Hu, Tao; Qiao, Dandi et al. (2016) Boosting Gene Mapping Power and Efficiency with Efficient Exact Variance Component Tests of SNP Sets. Genetics :
Shringarpure, Suyash S; Bustamante, Carlos D; Lange, Kenneth et al. (2016) Efficient analysis of large datasets and sex bias with ADMIXTURE. BMC Bioinformatics 17:218
Crawford, Forrest W; Stutz, Timothy C; Lange, Kenneth (2016) Coupling bounds for approximating birth-death processes by truncation. Stat Probab Lett 109:30-38
Zhou, Hua; Zhou, Jin; Hu, Tao et al. (2016) Genome-wide QTL and eQTL analyses using Mendel. BMC Proc 10:239-244
Paul, Kimberly C; Rausch, Rebecca; Creek, Michelle M et al. (2016) APOE, MAPT, and COMT and Parkinson's Disease Susceptibility and Cognitive Symptom Progression. J Parkinsons Dis 6:349-59
Brown, Robert; Lee, Hane; Eskin, Ascia et al. (2016) Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders. Eur J Hum Genet 24:113-9

Showing the most recent 10 out of 137 publications