In cancer prognosis, beyond the main effects of environmental/clinical (E) and genetic (G) risk factors, the interactions between G and E factors (G*E interactions) and those between G and G factors (G*G interactions) also play critical roles. The existing findings are insufficient, and there is a strong need for identifing more prognostic interactions. Most of the existing effort has been focused on data collection. In contrast, the development of effective analysis methods has been lagging behind. Compared to data collection, methodological development takes much less resources but is equally critical in making reliable findings. Most of the existing interaction analysis methods share the limitation of lacking robustness properties. In practice, data contamination and model mis-specification are not uncommon and can lead to severely biased model parameter estimation and false marker identification. The development of robust genetic interaction analysis methods is very limited. There are a few methods for case-control data, but they are not applicable to prognosis data. For prognosis data and interaction analysis, there is some very recent progress in quantile regression and rank-based methods, but the development has been limited and unsystematic. Last but not least, the existing robust methods have the common drawback of adopting ineffective marker selection techniques. Our group has been at the frontier of developing robust interaction analysis methods. Our statistical investigations and simulations have provided convincing evidences that the robust methods using the penalization technique outperform alternatives with significantly more accurate marker identification and model parameter estimation. In data analysis, important interactions missed by the existing analyses have been identified for multiple cancer types. However, we have also found that the scope of the existing studies needs to be significantly expanded in terms of both methodological development and data analysis. This project has been motivated by the importance of interactions in cancer prognosis and limitations of the existing studies. Our objectives are as follows.
(Aim 1) Develop novel marginal analysis methods that are robust to data contamination and model mis-specification for identifying important interactions.
(Aim 2) Develop novel joint analysis methods that are robust to data contamination and model mis-specification for identifying important interactions.
(Aim 3) Develop tailored inference approaches to draw more definitive conclusions on the identified interactions.
(Aim 4) Develop public R software and a dynamic project website. Identify prognostic interactions for multiple cancers. For the identified interactions, we will conduct extensive bioinformatic and statistical analysis, evaluations, and comparisons. With our unique expertise, extensive experiences, and promising preliminary studies, this project has a high likelihood of success.
This study will be the first to systematically develop and implement novel robust methods for identifying gene- environment and gene-gene interactions for cancer prognosis. With methodological advancements and extensive data analysis, important genetic interactions missed by the existing studies will be identified for multiple cance types.
Showing the most recent 10 out of 12 publications