The proposed research seeks to develop new statistical methods for assessing performance of prediction models for cancer risk and prognosis when the endpoint of interest such as patient survival or time to cancer recurrence is subject to potentially dependent censoring, which is often present in observational and epidemiological studies. The significance of prediction models for cancer risk and prognosis has been well established: they can be used to identify individuals at high risk, plan interventional trials and subsequently design and improve personalized prevention and treatment strategies, and estimate the population burden, the cost of cancer, and the impact of potential interventions and treatments. In order to identify optimal (or better) prediction models, it is crucial to develop robust predictive accuracy metrics for assessing and comparing prediction models. Predictive accuracy metrics that do not adjust for censoring mechanism likely lead to biased assessment of prediction models in the presence of dependent censoring. While a considerable amount of work has been reported on development of predictive accuracy metrics, there has been only limited work on predictive accuracy metrics for censored data, most of which have been developed for the case of independent censoring and limited to Cox proportional hazard models. In addition, owing to major advances in technology, it has become increasingly common that high-dimensional biomarkers such as genomic and proteomic data are collected in cancer research studies and modern statistical methods have been developed to utilize these high-dimensional data when constructing prediction models, which presents another challenge for assessing predictive accuracy in the presence of dependent censoring. These considerations lead to our specific aims as follows: 1) develop new metrics to account for censoring mechanism when assessing predictive accuracy of regression models for cancer endpoints that are subject to dependent censoring;2) develop new metrics to account for censoring mechanism when assessing predictive accuracy of accelerated failure time models for cancer endpoints that are subject to dependent censoring;3) develop sensitivity analysis for the case where censoring may depend on unobserved survival times;and 4) perform systematic evaluation of predictive accuracy metrics for censored data through extensive simulations and real data analysis. The proposed statistical methods, once developed, will allow for assessment of predictive accuracy of prediction models under a wide range of settings including different censoring mechanisms and for high-dimensional data. The proposed numerical studies will shed important insight on applicability, advantages, and disadvantages of different metrics, as well as impact of censoring mechanism on these metrics, and subsequently provide better guidance to cancer researchers on how to use and interpret these metrics in research studies and in practice.

Public Health Relevance

The objective is to develop statistical methods for assessing prediction models for cancer endpoints that are subject to dependent censoring in observational and epidemiological studies. The proposed statistical methods will allow for assessment of predictive accuracy under a wide range of settings including different censoring mechanisms and for high-dimensional data. The proposed numerical studies will shed important insight on applicability, advantages, and disadvantages of different metrics, as well as impact of censoring mechanism on these metrics, and subsequently provide better guidance to cancer researchers on how to use and interpret these metrics in research studies and in practice.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Research Grants (R03)
Project #
5R03CA173770-02
Application #
8606737
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Mariotto, Angela B
Project Start
2013-02-01
Project End
2016-01-31
Budget Start
2014-02-01
Budget End
2016-01-31
Support Year
2
Fiscal Year
2014
Total Cost
Indirect Cost
Name
Emory University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30322
Chang, Changgee; Kundu, Suprateek; Long, Qi (2018) Scalable Bayesian variable selection for structured high-dimensional data. Biometrics :
Safo, Sandra E; Li, Shuzhao; Long, Qi (2018) Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics 74:300-312
Zhao, Yize; Kang, Jian; Long, Qi (2018) Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data. IEEE/ACM Trans Comput Biol Bioinform 15:537-550
Pellegrini, Kathryn L; Sanda, Martin G; Patil, Dattatraya et al. (2017) Evaluation of a 24-gene signature for prognosis of metastatic events and prostate cancer-specific mortality. BJU Int 119:961-967
Hu, Yi-Juan; Schmidt, Amand F; Dudbridge, Frank et al. (2017) Impact of Selection Bias on Estimation of Subsequent Event Risk. Circ Cardiovasc Genet 10:
Li, Ziyi; Safo, Sandra E; Long, Qi (2017) Incorporating biological information in sparse principal component analysis with application to genomic data. BMC Bioinformatics 18:332
Deng, Yi; Zhang, Xiaoxi; Long, Qi (2017) Bayesian modeling and prediction of accrual in multi-regional clinical trials. Stat Methods Med Res 26:752-765
Zhao, Yize; Chung, Matthias; Johnson, Brent A et al. (2016) Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence. J Am Stat Assoc 111:1427-1439
Torres, Mylin A; Yang, Xiaofeng; Noreen, Samantha et al. (2016) The Impact of Axillary Lymph Node Surgery on Breast Skin Thickening During and After Radiation Therapy for Breast Cancer. Int J Radiat Oncol Biol Phys 95:590-6
Wang, Ming; Long, Qi (2016) Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic. Biometrics 72:897-906

Showing the most recent 10 out of 14 publications