In cancer prognosis, beyond the main effects of environmental/clinical (E) and genetic (G) risk factors, the interactions between G and E factors (G*E interactions) and those between G and G factors (G*G interactions) also play critical roles. The existing findings are insufficient, and there is a strong need for identifing more prognostic interactions. Most of the existing effort has been focused on data collection. In contrast, the development of effective analysis methods has been lagging behind. Compared to data collection, methodological development takes much less resources but is equally critical in making reliable findings. Most of the existing interaction analysis methods share the limitation of lacking robustness properties. In practice, data contamination and model mis-specification are not uncommon and can lead to severely biased model parameter estimation and false marker identification. The development of robust genetic interaction analysis methods is very limited. There are a few methods for case-control data, but they are not applicable to prognosis data. For prognosis data and interaction analysis, there is some very recent progress in quantile regression and rank-based methods, but the development has been limited and unsystematic. Last but not least, the existing robust methods have the common drawback of adopting ineffective marker selection techniques. Our group has been at the frontier of developing robust interaction analysis methods. Our statistical investigations and simulations have provided convincing evidences that the robust methods using the penalization technique outperform alternatives with significantly more accurate marker identification and model parameter estimation. In data analysis, important interactions missed by the existing analyses have been identified for multiple cancer types. However, we have also found that the scope of the existing studies needs to be significantly expanded in terms of both methodological development and data analysis. This project has been motivated by the importance of interactions in cancer prognosis and limitations of the existing studies. Our objectives are as follows.
(Aim 1) Develop novel marginal analysis methods that are robust to data contamination and model mis-specification for identifying important interactions.
(Aim 2) Develop novel joint analysis methods that are robust to data contamination and model mis-specification for identifying important interactions.
(Aim 3) Develop tailored inference approaches to draw more definitive conclusions on the identified interactions.
(Aim 4) Develop public R software and a dynamic project website. Identify prognostic interactions for multiple cancers. For the identified interactions, we will conduct extensive bioinformatic and statistical analysis, evaluations, and comparisons. With our unique expertise, extensive experiences, and promising preliminary studies, this project has a high likelihood of success.

Public Health Relevance

This study will be the first to systematically develop and implement novel robust methods for identifying gene- environment and gene-gene interactions for cancer prognosis. With methodological advancements and extensive data analysis, important genetic interactions missed by the existing studies will be identified for multiple cance types.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA204120-03
Application #
9512805
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Divi, Rao L
Project Start
2016-07-01
Project End
2020-06-30
Budget Start
2018-07-01
Budget End
2019-06-30
Support Year
3
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Yale University
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
Shi, Xingjie; Huang, Yuan; Huang, Jian et al. (2018) A Forward and Backward Stagewise Algorithm for Nonconvex Loss Functions with Adaptive Lasso. Comput Stat Data Anal 124:235-251
Teran Hidalgo, Sebastian J; Zhu, Tingyu; Wu, Mengyun et al. (2018) Overlapping clustering of gene expression data using penalized weighted normalized cut. Genet Epidemiol 42:796-811
Xu, Yaqing; Wu, Mengyun; Zhang, Qingzhao et al. (2018) Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics :
Wu, Cen; Jiang, Yu; Ren, Jie et al. (2018) Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Stat Med 37:437-456
Wu, Mengyun; Huang, Jian; Ma, Shuangge (2018) Identifying gene-gene interactions using penalized tensor regression. Stat Med 37:598-610
Yue, Mu; Li, Jialiang; Ma, Shuangge (2018) Sparse boosting for high-dimensional survival data with varying coefficients. Stat Med 37:789-800
Zang, Yangguang; Zhao, Qing; Zhang, Qingzhao et al. (2017) Inferring gene regulatory relationships with a high-dimensional robust approach. Genet Epidemiol 41:437-454
Chai, Hao; Shi, Xingjie; Zhang, Qingzhao et al. (2017) Analysis of cancer gene expression data with an assisted robust marker identification approach. Genet Epidemiol 41:779-789
Zhang, Qingzhao; Duan, Xiaogang; Ma, Shuangge (2017) Focused Information Criterion and Model Averaging with Generalized Rank Regression. Stat Probab Lett 122:11-19
Wu, Mengyun; Zang, Yangguang; Zhang, Sanguo et al. (2017) Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genet Epidemiol 41:523-554

Showing the most recent 10 out of 12 publications