Human cancer is a dynamic disease that develops over an extended time period through the accumulation of a series of genetic alterations. Delineating the system dynamics of disease progression can significantly advance our understanding of tumor biology, and lay a critical foundation for the development of improved cancer diagnostics, prognostics and targeted therapeutics. Traditionally, system dynamics is approached through time-course studies achieved by repeated sampling of the same cohort of subjects across an entire biological process. However, due to ethical and economic constraints, it is not feasible to collect time-series data to study human cancer, and typically we can only obtain profile data from excised tumor tissues. Consequently, while major efforts continue to reveal the genomic events associated with human cancer, to date, it has been difficult to put the identified changes in the context of the dynamic disease process. With the rapid development of sequencing technology, many thousands of static tumor samples are being collected in large-scale cancer studies. This provides us with a unique opportunity to develop a novel analytical strategy to use static data, instead of time-course data, to study disease dynamics. Built logically on our previous work, we propose a large-scale interdisciplinary research plan to develop a series of novel methods that enable the construction of high-resolution cancer progression models by using massive static data, the identification of pivotal molecular events that drive stepwise disease progression, and the visualization of identified changes in a cancer development roadmap. If successfully implemented, this work can effectively overcome the existing sampling limitations, and open a new avenue of research to study cancer dynamics by using vast tissue archive, instead of performing resource-intensive or impractical time-course studies. The developed methods will be intensively tested on 27 breast cancer datasets comprised of ~9,000 samples. To our knowledge, no prior work has been performed on this scale to study breast cancer dynamics. The analysis will result in the first working model of breast cancer progression constructed by incorporating all genetic information. The constructed model can provide a foundation for the visualization of key progressive molecular events and facilitate the identification of pivotal driver genes and pathways and potential points of susceptibility for therapeutic intervention. Moreover, interrogation of the constructed model will enable us to test novel hypotheses in silico and to prioritize resources for more focused and detailed investigations experimentally. We expect that our work will have a broad impact. Although in this study we focus mainly on breast cancer, the developed methods can also be used to study other cancers and other human progressive diseases, where the lack of time-series data to study system dynamics is a ubiquitous problem.

Public Health Relevance

Human cancer is one of the leading causes of death worldwide, a reflection of the fact that the molecular basis of the disease is yet poorly understood. We propose a novel computational strategy to build a cancer progression model using genomic data obtained from excised tumor tissue samples (static data). The delineation of the dynamic disease process and the identification of pivotal molecular events that drive stepwise cancer progression will provide a wealth of new insights into tumor biology and guide the development of improved cancer diagnostics, prognostics and targeted therapeutics.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
State University of New York at Buffalo
Schools of Medicine
United States
Zip Code