This research project is aimed at developing statistical theory and practical methodology for complex high dimensional correlated data where the full parametric likelihood function of the model is difficult to specify or intractable, and partial data information is not accurate or is missing. The PI and her collaborators will develop efficient and robust estimation procedures by incorporating correlation structures into the models where high dimensional nuisance parameters are present, and develop inference functions for hypothesis testing with low computational intensity. Part of research goals for the 5-year plan are: to provide an explicit maximum number of contaminated clusters allowed to maintain the consistency of the estimator using quadratic inference functions; to develop unbiased and efficient estimating functions if missing responses are missing at random, and inference functions for testing the model assumption; to develop an efficient esimator using a nonparametric regression spline with relatively low demand on computation, and introduce a goodness-of-fit test with a chi-squared property for testing whether coefficients in nonparametric regression are time-varying or time invariant; and, to develop semi-nonparametric models for cell cycle microarray data to incorporate both temporal correlation within genes and correlation between biologically related genes.
This research will have significant impact and many applications in biomedical research, econometrics, environmental studies, oceanography, social science and public health where correlated data arise often. The outlined research projects help to tackle fundamental questions in statistical science and will stimulate interest from a large group of scientists. It also makes connections between theory and methods developed in econometrics, statistics and biostatistics. The proposed research will benefit biomedical research to help combat life threatening diseases such as AIDS and cancer, and will make contributions to identifying cell cycle regulated genes more accurately. It will integrate current states of knowledge of proposed research areas substantially into educational activities through development of new courses on nonparametric methods and microarray data analysis. It will advance undergraduate and graduate students' learning and training in semiparametric and nonparametric methods. Furthermore, it will broaden opportunities and enable the participation of all citizens from various disciplines, including underrepresented minorities and international partnerships.