For seventeen years this grant has developed statistical and computational tools vital to gene mapping. During that period, technology and genomic data have changed dramatically. Expression and genotyping chips have become standard scientific tools, and the full genomes from a host of organisms, including the human species, have been sequenced. Computers have continued to grow exponentially in speed and memory. These parallel advances have powered hundreds of successful human gene mapping studies for both Mendelian and complex traits. Unfortunately, these successes have shed light on only a small fraction of the genetic heritability of complex traits. This is hardly surprising as current technology stresses common SNPs, and selection tends to drive common deleterious mutations to extinction. There are several candidates for the missing dark matter of genetic epidemiology. Among these are (a) copy number variants, (b) polygenes of small effect, (c) missed interactions among genes and between genes and environment, (d) epigenetic effects, (e) variation across populations, (f) rare variants, and (g) non-coding RNA. As sequencing costs continue to decline rapidly, the search for rare variants via large-scale sequencing is perhaps the most promising new route to disease gene discovery. In the next cycle of this grant, we plan to build on our successes, with particular stress on mining the growing avalanche of sequence data. The statistical analysis of sequence data is surely one of the most complex undertakings in all of modern biology. Currently, the data being generated are in danger of being squandered due to a lack of good analysis tools. Beyond raw sequence data, interesting connections are being forged by the bioinformatic and functional genomic communities. We desperately need to bring this accumulated knowledge in mutation severity prediction and gene interactions to bear on gene mapping. In our opinion, the extraordinarily fast, coordinate descent forms of penalized regression are the best candidate tools for successful analysis of high-dimensional sequencing data. Genetic analysis via penalized regression easily handles non-genetic predictors, uncertainty in genotype and sequence calls, corrections for ethnic admixture, quantitative traits and disease dichotomies, gene-gene and gene-environment interactions, and both rare and common variants.
Our first aim i s to extend our penalized regression algorithms to incorporate prior biological knowledge at the variant level;distinguish modes of inheritance at the gene level;capture multivariate phenotypes;and exploit network information in interaction testing. Additional aims of this proposal include new methods to use sequence data to rule out variants involvement in Mendelian traits;extensions to our tests for intergenerational effects;and more efficient algorithms for genome-wide association tests based on pedigree data. Finally, we will implement all these innovations in our mature, freely distributed, statistical genetics package MENDEL.

Public Health Relevance

The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity.
Our aim i s to develop, implement, and freely distribute new, efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
2R01GM053275-18
Application #
8238834
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
1995-08-01
Project End
2016-03-31
Budget Start
2012-04-01
Budget End
2013-03-31
Support Year
18
Fiscal Year
2012
Total Cost
$551,431
Indirect Cost
$182,867
Name
University of California Los Angeles
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Brown, Robert; Pasaniuc, Bogdan (2014) Enhanced methods for local ancestry assignment in sequenced admixed individuals. PLoS Comput Biol 10:e1003555
Kaser, Arthur; Pasaniuc, Bogdan (2014) IBD genetics: focus on (dys) regulation in immune cells and the epithelium. Gastroenterology 146:896-9
Kichaev, Gleb; Yang, Wen-Yun; Lindstrom, Sara et al. (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 10:e1004722
Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo et al. (2014) Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30:2906-14
Lange, Kenneth; Papp, Jeanette C; Sinsheimer, Janet S et al. (2014) Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data. Annu Rev Stat Appl 1:279-300
RaƱola, John Michael; Novembre, John; Lange, Kenneth (2014) Fast spatial ancestry via flexible allele frequency surfaces. Bioinformatics 30:2915-22
Han, Eunjung; Sinsheimer, Janet S; Novembre, John (2014) Characterizing bias in population genetic inferences from low-coverage sequencing data. Mol Biol Evol 31:723-35
Chi, Eric C; Lange, Kenneth (2014) A Look at the Generalized Heron Problem through the Lens of Majorization-Minimization. Am Math Mon 121:95-108
Lange, Kenneth (2014) Hadamard's Determinant Inequality. Am Math Mon 121:258-259
Ko, Arthur; Cantor, Rita M; Weissglas-Volkov, Daphna et al. (2014) Amerindian-specific regions under positive selection harbour new lipid variants in Latinos. Nat Commun 5:3983

Showing the most recent 10 out of 97 publications