Next-generation sequencing (NGS) is enabling the routine, systematic analysis of somatic aberrations that accumulate in cancer genomes. Many of the functional mutations are structural, involving the deletion, duplication, translocation, insertion, or inversion of nucleotide sequences. Detecting these structural variations is fundamentally challenging due to the enormous number of ways a cancer genome can be altered and the presence of widespread repeats that obstruct the accurate alignment of short reads. Moreover, structural complexities are often compounded by clonal heterogeneity, i.e., mixtures of cell populations that contain heterogeneous aberrations in a tumor specimen, which result in diverse structural and copy number profiles. These issues pose an unprecedented challenge to developing practically useful computational tools that can be used to identify the presence of a structural variant and elucidate its functional and clinical relevance. To fully harness the power of NGS and to facilitate advances toward personalized medicine, we propose to develop a set of novel computational tools for detecting structural variants in heterogeneous cancer genomes. Specifically, we plan to pursue the following aims: 1) Develop novel computational tools for sensitive breakpoint detection and assembly, 2) Develop a statistical framework to characterize structural variants in heterogeneous tumors, and 3) Examine our tools through large-scale experimental validation and distribute the tools through an open source. Our short-term goal is to boost the transformation of the staggering amount of polyclonal NGS data produced by cancer genome sequencing projects such as The Cancer Genome Atlas and by the International Cancer Genome Consortium, to improve our understanding of tumor evolution and identify variants of functional and clinical relevance. Our long-term goal is to develop algorithms and prototypes that are usable in clinical settings for personalized diagnosis and treatment.

Public Health Relevance

This proposed project will deliver a set of computational algorithms to measure the clonal and the structural complexity of data produced by next-generation genome and transcriptome sequencing of tumor cells. Acquiring these algorithms is imperative for personalized diagnosis and treatment.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas MD Anderson Cancer Center
Biostatistics & Other Math Sci
Other Domestic Higher Education
United States
Zip Code
Chen, Tenghui; Wang, Zixing; Zhou, Wanding et al. (2016) Hotspot mutations delineating diverse mutational signatures and biological utilities across cancer types. BMC Genomics 17 Suppl 2:394
Zafar, Hamim; Wang, Yong; Nakhleh, Luay et al. (2016) Monovar: single-nucleotide variant detection in single cells. Nat Methods 13:505-7
Zhou, Wanding; Chen, Tenghui; Chong, Zechen et al. (2015) TransVar: a multilevel variant annotator for precision genomics. Nat Methods 12:1002-3
Chen, Ken; Meric-Bernstam, Funda; Zhao, Hao et al. (2015) Clinical actionability enhanced through deep targeted sequencing of solid tumors. Clin Chem 61:544-53
1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74
Abyzov, Alexej; Li, Shantao; Kim, Daniel Rhee et al. (2015) Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat Commun 6:7256
Zhou, Wanding; Zhao, Hao; Chong, Zechen et al. (2015) ClinSeK: a targeted variant characterization framework for clinical sequencing. Genome Med 7:34
Malhotra, Ankit; Wang, Yong; Waters, Jill et al. (2015) Ploidy-Seq: inferring mutational chronology by sequencing polyploid tumor subpopulations. Genome Med 7:6
Zhou, Wanding; Chen, Tenghui; Zhao, Hao et al. (2014) Bias from removing read duplication in ultra-deep sequencing experiments. Bioinformatics 30:1073-1080
Chen, Ken; Chen, Lei; Fan, Xian et al. (2014) TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res 24:310-7

Showing the most recent 10 out of 15 publications