Large-scale multiple testing is an important and rapidly growing area in modern Statistics. The proposed research focuses on new theories, methodologies and computational algorithms to address the fundamental questions and new challenges in this field. The investigator develops new concepts, data-driven schemes and solid theories that promise to improve the statistical efficiency and lay the foundation for simultaneous inferences in large-scale studies, especially when heterogeneity, dependence and other complex structures are present. The major components of the proposed research include: (i) the concept of simultaneously incorporating statistical significance and effect size in multiple testing and a new approach to identifying large non-null effects in heteroscedastic models; (ii) the strategy of exploiting spatial dependency and a new approach to testing correlated hypotheses in a hidden Markov random field; (iii) the strategy of grouping hypotheses in sets and a new approach to testing the significance of multiple groups of important variables; and (iv) the concepts of discovery boundary and effective screening, and a data-driven approach to reducing dimensionality by constructing subsets that are optimal in size and adaptive to unknown sparsity.

The proposed research has significant impact on many scientific applications such as genome-wide association studies, time-course microarray experiments, disease mapping in environmental studies, climate modeling, and medical imaging studies. The multiple testing and screening methods outlined in the proposal will improve the quality of simultaneous decision-making in complicated situations, yield more interpretable and reproducible scientific results, lead to great savings in costs in large-scale investigations, and hence help achieve the ultimate goal of understanding the underlying mechanisms in complex systems or human diseases in a precise, fast and cost-effective way. User-friendly software will be developed and made freely available for public use. Research results will be disseminated through publications, seminars and workshops. The investigator is committed to encouraging the participation of under-represented groups in science, and to integrating the proposed research into educational activities through developing new courses, and through mentoring and training students to work on the frontiers in Statistics with important health science applications.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1007675
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2010-08-01
Budget End
2012-08-31
Support Year
Fiscal Year
2010
Total Cost
$107,176
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695