Somatic mutations play a critical role in the development and progression of cancer. Studying the association between somatic mutations and cancer-related traits is essential for understanding the genetic basis of cancer, and constitutes a fundamental step towards precision oncology. However, due to the low frequency of somatic mutations and high complexity of cancer, development of statistical methods for analyzing these data has severely fallen behind. In this project, we propose to develop powerful statistical approaches for the analysis of somatic mutation data.
In Specific Aim 1, we propose methods for investigating the association between somatic mutations and multiple cancer-related traits, including continuous, binary, and survival traits. The joint analysis of multiple traits will lead to better understanding of the etiology of cancer, and substantially improve statistical power for detecting disease-associated mutations.
In Specific Aim 2, we will develop novel methods for analyzing cancer subtypes with respect to somatic mutations. The proposed methods will be essential for understanding the cancer heterogeneity and for developing personalized therapy for cancer patients.
In Specific Aim 3, we will develop statistical approaches for conducting biological pathway analysis that includes both somatic and germline mutations. The proposed approaches can handle pathways with large dimensions, and are statistically more powerful than existing approaches. Our methods are motivated by the research projects of the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), which is the largest colorectal-cancer consortium in the world. The PI and co-investigators of this proposal are directly involved in the GECCO project, and hence can quickly translate statistical findings into public health benefits and new scientific explorations. The proposed methods development will provide essential tools for studying somatic mutations and make significant contributions to biomedical science.

Public Health Relevance

Somatic mutation data provide exciting opportunities for deciphering the genetic basis of cancer and have the potential to transform the diagnosis, treatment, and prevention of cancer. However, due to the low frequency of somatic mutations and highly complex nature of cancer, statistical methods for analyzing the data are severely underdeveloped. In this project, we will develop novel statistical methods for identifying somatic mutations that play critical roles in cancer development and progression, which will provide powerful analytic tools for cancer researchers and make a significant contribution to the ongoing efforts for precision medicine.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Chen, Huann-Sheng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Fred Hutchinson Cancer Research Center
United States
Zip Code