For nearly two decades this grant has developed statistical and computational tools vital to gene mapping. During that period, technology and genomic data changed dramatically. Expression and genotyping chips became standard scientific tools; the full genomes from a host of organisms, including the human species, were sequenced; and low-cost sequencing transitioned from fantasy to reality. The last decade has also witnessed a shift from common to rare SNVs (single nucleotide variants) and from moderate-sized studies to large consortium studies. Simultaneously, computers have grown exponentially in speed and memory. These parallel advances have powered thousands of successful human gene mapping studies for both Mendelian and complex traits. Because these successes have shed light on only a fraction of the heritability of common traits, we have not yet reached the endgame of statistical genetics. There is still need for new ideas and better software. We plan to build on our previous successes, with particular stress on adapting modern methods of data mining to genetic applications. We and others have made great strides in applying penalized estimation and model selection in genomics. Genetic analysis via penalized regression easily handles non-genetic predictors, uncertainty in genotype and sequence calls, corrections for ethnic admixture, quantitative traits and disease dichotomies, gene-gene and gene-environment interactions, and both rare and common variants. Unfortunately, it is now apparent that penalized estimation is hampered by severe shrinkage and inflated false positive rates. Our recent development of the proximal distance algorithms and AIC (Akaike information criterion) guided regression show that severe shrinkage can be eliminated and false positive rates tamed. We are also convinced that haplotypes have been underexploited in genetic analysis. These flag local gene sharing, serve as surrogates for rare variants, capture intragenic interactions, and enable both fixed and random effects QTL (quantitative trait locus) mapping. Our extensive list of aims should not be interpreted as a lack of focus. Our track record shows that we can make progress on a number of fronts simultaneously. All of our efforts are directed toward sharpening the tools of genetic analysis. As our programs SIMWALK, MENDEL, and ADMIXTURE illustrate, we are committed to translating theoretical advances into user-friendly software. These programs are notable for their comprehensiveness, speed, reliability, small memory usage, and detailed documentation. The goal of this grant is to empower the very large genetic studies on the horizon. Collectively, our Specific Aims go a long way towards that goal.
The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, the scientific community's ability to collect genetic information has now far outstripped its ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers' ability to map and characterize genes that lead to human diseases and trait variation.
Showing the most recent 10 out of 156 publications