Next-generation sequencing (NGS) is enabling the routine, systematic analysis of somatic aberrations that accumulate in cancer genomes. Many of the functional mutations are structural, involving the deletion, duplication, translocation, insertion, or inversion of nucleotide sequences. Detecting these structural variations is fundamentally challenging due to the enormous number of ways a cancer genome can be altered and the presence of widespread repeats that obstruct the accurate alignment of short reads. Moreover, structural complexities are often compounded by clonal heterogeneity, i.e., mixtures of cell populations that contain heterogeneous aberrations in a tumor specimen, which result in diverse structural and copy number profiles. These issues pose an unprecedented challenge to developing practically useful computational tools that can be used to identify the presence of a structural variant and elucidate its functional and clinical relevance. To fully harness the power of NGS and to facilitate advances toward personalized medicine, we propose to develop a set of novel computational tools for detecting structural variants in heterogeneous cancer genomes. Specifically, we plan to pursue the following aims: 1) Develop novel computational tools for sensitive breakpoint detection and assembly, 2) Develop a statistical framework to characterize structural variants in heterogeneous tumors, and 3) Examine our tools through large-scale experimental validation and distribute the tools through an open source. Our short-term goal is to boost the transformation of the staggering amount of polyclonal NGS data produced by cancer genome sequencing projects such as The Cancer Genome Atlas and by the International Cancer Genome Consortium, to improve our understanding of tumor evolution and identify variants of functional and clinical relevance. Our long-term goal is to develop algorithms and prototypes that are usable in clinical settings for personalized diagnosis and treatment.

Public Health Relevance

This proposed project will deliver a set of computational algorithms to measure the clonal and the structural complexity of data produced by next-generation genome and transcriptome sequencing of tumor cells. Acquiring these algorithms is imperative for personalized diagnosis and treatment.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas MD Anderson Cancer Center
Biostatistics & Other Math Sci
Other Domestic Higher Education
United States
Zip Code
Chen, Ken; Chen, Lei; Fan, Xian et al. (2014) TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res 24:310-7
Wang, Yong; Waters, Jill; Leung, Marco L et al. (2014) Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512:155-60
Fan, Xian; Zhou, Wanding; Chong, Zechen et al. (2014) Towards accurate characterization of clonal heterogeneity based on structural variation. BMC Bioinformatics 15:299
Zhou, Wanding; Chen, Tenghui; Zhao, Hao et al. (2014) Bias from removing read duplication in ultra-deep sequencing experiments. Bioinformatics :
Fan, Xian; Abbott, Travis E; Larson, David et al. (2014) BreakDancer - Identification of Genomic Structural Variation from Paired-End Read Mapping. Curr Protoc Bioinformatics 2014: