This project aims at developing adaptively powerful testing procedures for high-dimensional data with applications in genetics, genomics and neuroimaging. Due to recent biotechnological advances, large amounts of high-throughput and high-dimensional molecular and imaging data have been collected, resulting in a number of new and challenging statistical questions. One question is how polygenic testing in genome-wide association studies (GWAS) may be used to answer whether some of the millions of genetic variants are associated with a complex disease like Alzheimer's disease. The answer to this question is important to uncovering disease-related genes, and thus developing effective prevention and treatment strategies. The focus on rigorous hypothesis testing to avoid false discoveries, while maximizing the chance for true discoveries, is critical to modern genetic, genomic and other omic studies. The methods will be applied to data related to Alzheimer's disease, for which currently there is no cure, and more powerful analysis methods are urgently needed to unravel the underlying biology. Graduate students will be involved in the conduct of the research and development of the computational tools, and publicly available software packages will be developed for use by other biomedical researchers.
This research will advance the frontiers of modern statistical methodology in hypothesis testing with high-dimensional data and related rare event assessment. Powerful adaptive methods for testing high-dimensional mean parameters in generalized linear models as well as high-dimensional covariance matrix structures will be developed. The adaptive test statistics are constructed based on high-dimensional high-order von Mises V-statistics and U-statistics, and will provide uniformly high power against sparse, dense, as well as moderately sparse or dense signals for flexible asymptotic regimes. Another thrust of the research deals with the challenging and important rare-event estimation problem in analysis of genome-wide molecular and neuroimaging data, where a high stringent statistical significance level is usually needed. To evaluate such small probabilities, the research will lead to theoretical tail probability approximations as well as efficient Monte Carlo methods using non-standard change-of-measure techniques.