Instead of modeling the hazard function for censored survival data as the famous Cox model does, modeling the survival time directly by certain transformation becomes increasingly appealing to practitioners because it postulates a simple relationship between the response variable and covariates with easily interpretable parameters. As a special example, the semiparametric accelerated failure time model that transforms the failure time by logarithm has been studied extensively in the past decade. Existing challenges for those semiparametric linear transformation models include semiparametric efficient estimation, asymptotic theory with more realistic conditions that may lead to good properties for survival time prediction, a measurability issue in the stochastic integral formulation for the outcome-dependent weighted estimating methods, and high-dimensional data analysis. The investigator proposes new methods to tackle those emerging issues in the semiparametric linear transformation models. Asymptotic theories will be proved by using the modern empirical process theory. Numerical implementations of all the proposed methods will be based on either those well developed algorithms for discrete estimating functions or the Newton-Raphson method for smooth objective functions in which the infinite-dimensional parameter is approximated by a smoothing estimator. To enhance the predicting ability, more flexible transformations are considered for problems with high-dimensional data. Penalized method will be investigated in order to obtain simultaneous variable selection and survival time prediction.
Statistical models considered in this project have important applications in a wide spectrum of disciplines such as biology, medicine, health studies, and engineering. The proposed research is particularly motivated by the multi-cohort study for the women's reproductive life staging in which the prediction of age at menopause is of major interest, and by the Michigan ovary cancer and lung cancer studies that look for relevant genes and good models for predicting patients' survival using gene expression data. It will provide methods that use data more efficiently and yield more precise prediction. It will also allow the investigator to add more thorough statistical results to the courses of advanced survival analysis and semiparametric models for graduate students. The proposed research activities will motivate graduate students to become independent researchers who are able to engage in fundamental statistical research.