This proposal aims to develop new theory and methodology for sufficient dimension reduction (SDR). In particular, the research focuses on biostatistical problems which commonly include missing data, survival data, and longitudinal data analysis. The proposed methodology can effectively transform a high dimensional regression problem to a low dimensional projection, retain full regression information, and impose few or no probabilistic models. There are three components to this research. First, the investigator proposes a family of augmented inverse probability weighted SDR estimators when predictors have missing observations. This new approach allows a more general missing data mechanism than the existing solution and permits more flexible regression forms beyond the homoscedastic linear model. The second component of the research targets SDR for survival data, where the response of time to death or disease recurrence is subject to censoring. Viewing the censored response as a specific type of missing data, the investigator integrates an inverse probability weighted estimation strategy with a variety of SDR methods. Thirdly, the investigator studies a type of longitudinal data where measurements for all the study subjects are collected at the same scheduled time points. Both a population foundation and the associated estimation procedure are developed. All three components center around commonly encountered biostatistical problems and the development of the three components are interrelated.
Modern technologies have pushed the frontier of science with the capability of generating and collecting data in large quantity and high dimensionality. Examples of large high dimensional data sets arise in a great number of research areas, such as environmental studies, human health and medical research, and homeland security. Sufficient dimension reduction (SDR) methodology effectively transforms a high dimensional data problem to a low dimensional one. Consequently, SDR allows many existing analytical methods, which used to be hindered by the curse of dimensionality, to now work for the high dimensional problems. In addition, informative visualization of the data often becomes possible after dimension reduction, facilitating both the understanding and the analysis of the data. By developing new theory and methodology for missing, censored, and correlated data, the investigator's research extends the boundary of SDR to biostatistics as well as other disciplines such as econometrics, finance and bioinformatics. The impact of this research is anticipated to be widespread, due to the prevalence of the high dimensional data and the urgent demand for effective analytical tools to tackle those problems.