Linear models analysis is one of the most appealing statistical methods for its directly interpretable results. The accelerated failure time (AFT) and censored quantile regression (QR) model serve counterparts of the classical linear and uncensored QR model for censored data, and complement the Cox-proportional hazards model. Censored QR, in particular, enriches linear models analysis for censored data by allowing non-constant covariate effects across the distribution of event times. Other regression methods unduly constrain the covariate effects to be constant and fail to provide consistent results. In contrast censored QR allows the treatment effect to be negative for more severe cases (with shorter event-free survival times) but positive in other cases. The AFT and censored QR model are, however, under-utilized as flexible and general methods for estimation, variable selection and inference do not exist. This investigation includes developing (A) flexible estimation methods that work under less stringent conditions than those for existing methods, (B) methods for variable selection, including high dimensional data, and (C) general empirical likelihood(EL) methods parallel to uncensored case. In addition, the general ideas of the proposed research and method developed are applicable to truncation or other censoring types, although they are developed under random right censoring mechanism.

Improving statistical models for predicting medical outcomes is always an important part of statistical research. Thanks to recent advancement in high throughput technologies, a vast amount of potentially useful information, including patient's gene profile, is available and anticipated to lead to much improved prediction. The proposed study investigates novel methods to incorporate those data in building a better statistical model to more accurately predict a patient survival. The type of models to be investigated are also more sophisticated: instead of predicting only an "average" person's survival, they allow prediction for "top 10%, or "bottom 10%", while allowing the survivals can be very differently impacted by the gene profile.

Project Report

Time to event outcomes are often of interest in many research studies. Linear models analysis is one of the most appealing methods for its directly interpretable results. However, it is under-utilized for time to event censored data as we do not have flexible and general methods for estimation, variable selection and inference among others. The proposed research address these issues by developing (A) flexible estimation methods that work under less stringent conditions than those for existing methods, (B) methods for variable selection, and (C) general EL methods parallel to uncensored case. The individual themes of this project are important in its own right and also from a broader perspective. The results will be widely applicable to many scientific and medical problems for which a lack of estimation, variable selection and inference tool has precluded linear models analysis under censoring. For example, survival time or time to relapse commonly appears as primary outcome in many biomedical studies. Standard statistical analytic tools are constrained by rigid assumptions and arbitrary requirements. One of such constraint is the proportionality of hazards that imposes hazard of one group uniformly higher than those of comparison groups all though out lifespan of interest. In reality such uniformity is rare and possibly the hazard of one group may be higher or lower short-term but converges to the hazard of the rest of comparison groups over time. We have developed modeling and inference tools for such changing hazards. We allow potentially different short-term and long-term hazards, which are highly conceptually intuitive and interpretable. Also the proposed research will contribute to advancing other sciences where continuing development in high throughput technologies will make high dimensional data routinely available. Particularly many studies in biomedical science aim to link time-to-event outcome with a genomic, proteomic or imaging data to obtain a system-level understanding of the underlying biological/bio-chemical process and to build a predictive model for future events. The variable selection methods proposed in this research can be readily extend to regularized estimation problems for such data. They will lead to better model selection methods and thus will help to build a more accurate prediction model as to predicting survival times of cancer patients or time until the second cardiac event.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1007535
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2010-10-01
Budget End
2013-09-30
Support Year
Fiscal Year
2010
Total Cost
$120,838
Indirect Cost
Name
Children's Hospital Medical Center
Department
Type
DUNS #
City
Cincinnati
State
OH
Country
United States
Zip Code
45229