This research project is aimed at developing statistical theory and practical methodology for complex high-dimensional clustered data where the number of variables is larger than the sample size. This problem is especially important and relevant in microarray data where there are thousands of genes involved. The focus of this research will be to show how to efficiently and accurately extract information from a large quantity of often noisy information consisting of high-dimensional data, so as to identify and select significant variables of scientific interest. The PI and her collaborators will develop estimation procedures, statistical inference functions, model selection and classification procedures by incorporating correlation into the models. The specific goals for this research plan are: (1) To propose flexible estimation procedures for the link function and the marginal variance function when their forms are unknown in the generalized linear models; (2) To develop semiparametric classification for time-course gene expression data; (3) To propose model selection criteria for choosing informative correlation structures; (4) To develop efficient and consistent model selection procedures for generalized additive models where the likelihood is unspecified; (5) To develop a sufficient dimension reduction method for correlated data and retain the full regression information without imposing parametric models.

The research project will help to tackle fundamental questions in statistical science and will stimulate interest from a large group of scientists in the fields of longitudinal and cluster data analysis. It will also enhance the development of, and makes connections between, theory and method in statistics, biostatistics and computer science. This research will have significant impact and many applications in biomedical studies, genome research, econometrics, environmental studies, oceanography, social science and public health where correlated data often arise. The PI will integrate the proposed research areas substantially into educational activities through the development of new university courses, and through presenting short courses at major statistical meetings. The research will advance undergraduate and graduate students' learning and training for handling high-dimensional correlated data.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0906660
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$210,144
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820