Statistical Models and Analysis of Complex Genomic Variation in Clonal Mixtures

Zhang, Nancy; Ji, Hanlee

Abstract

Next generation DNA sequencing (NGS) approaches are widely used in studying human diseases and identifying causative genetic variants. Increasingly, NGS methods are being used to define biologically relevant clonal mixtures, a frequently observed phenomenon in human disease. Examples of clonal mixtures in human disease include tumor cell subpopulations that are a part of cancer. Within a single tumor and clearly evident in metastatic tumor sites, cancer cell clonal populations exist, are genetically distinct and carry their own unique set of somatic variants. A similar phenomenon occurs in viral infection where multiple viral quasispecies are harbored within an infected individual;each quasispecies has their own unique set of genetic variants. One can quantitatively measure expansions or shrinkage in clonal populations as seen in changes in allelic representation of clonal variants. Specific cellular phenotypes are attributable to the unique clonal variants and changes in their representation can be indicators of evolutionary processes. This is frequently the case for drug resistance in cancer and viral infections. Thus, clonal genetic variation has major implications for the pathogenesis of human disease and is increasingly being tested as a longitudinal indicator of disease progression and treatment resistance. The general availability of whole genome and deep targeted resequencing provides an opportunity to conduct systematic analysis of heterogeneous DNA mixtures that have different clonal components. However, in many cases the genetic variant of interest is present at very small proportions (<5%) and this makes the delineation of these clonal variants exceeding difficult. Many of the widely employed NGS analysis methods are optimized for detecting normal diploid genome variation. These approaches are not optimal for delineating genomic variants from complex clonal mixtures. Some genomic DNA variant classes such as genomic rearrangements are extremely difficult to detect in the context of clonal mixtures. To improve the assessment of clonal variation and evolution of specific clonal populations, we will develop innovative models and robust, sensitive statistical procedures. These methods will enable one to deconvolute genomic variation in clonal mixtures and consider clonal alterations through time and space. We will focus on improving the delineation of complex variations such as genomic rearrangements and other structural variations in genetic mixtures. To develop our methods, we will use heterogeneous DNA sequence data sets with in silico spike in variants and consider the lowest threshold of detection that we can achieve with the best sensitivity and specificity. Subsequently, we will test these methods on NGS data sets from clinical samples, delineate clonal populations based on unique variants and consider quantitative changes in allelic representation as seen in clonal expansion. These samples will be subject to whole genome and targeted resequencing. Cancer relevant samples will include tumors with matched normal, primary and metastatic DNA. We will consider viral quasispecies for a set of clinical samples where we have matched viral nucleic samples obtained longitudinally over the course of infection from a single individual. As a final milestone, we will release our methods as open source software for the biomedical research community.

Public Health Relevance

Breakthroughs in DNA sequencing technologies are having a major impact on the study of human diseases and these methods are increasingly being applied to improve diagnosis and treatment. A hallmark of diseases such as cancer and viral infections is their genetic complexity. For example, even within a single patient, cancer or viral infections are not homogeneous in their genetic composition, but rather contain smaller populations that have unique genetic changes. As a result, these disease states are genetic mixtures and determining the most important genetic changes is complicated and difficult. We will develop methods and approaches that will improve the analysis and detection of disease-related genetic changes from mixtures with direct application in cancer and viral infections.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 2R01HG006137-04
Application #: 8759269
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2011-07-06
Project End: 2017-06-30
Budget Start: 2014-09-12
Budget End: 2015-06-30
Support Year: 4
Fiscal Year: 2014
Total Cost: $258,774
Indirect Cost: $68,689

Institution

Name: University of Pennsylvania
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2020 R01 HG	Single Cell Transcriptomic and Genetic Diversity by Single Molecule Long Read Sequencing Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2019 R01 HG	Genomic and Cellular Variation from Single Molecules to Single Cells Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2018 R01 HG	Genomic and Cellular Variation from Single Molecules to Single Cells Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2017 R01 HG	Genomic and Cellular Variation from Single Molecules to Single Cells Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2016 R01 HG	Statistical Models and Analysis of Complex Genomic Variation in Clonal Mixtures Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2015 R01 HG	Statistical Models and Analysis of Complex Genomic Variation in Clonal Mixtures Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2014 R01 HG	Statistical Models and Analysis of Complex Genomic Variation in Clonal Mixtures Zhang, Nancy R.; Ji, Hanlee / University of Pennsylvania	$258,774
NIH 2013 R01 HG	Statistical Models for Genome Sequencing and Association Ji, Hanlee; Zhang, Nancy R. / Stanford University	$215,645
NIH 2012 R01 HG	Statistical Models for Genome Sequencing and Association Ji, Hanlee; Zhang, Nancy R. / Stanford University	$215,498
NIH 2011 R01 HG	Statistical Models for Genome Sequencing and Association Zhang, Nancy R.; Ji, Hanlee / Stanford University	$216,462

Publications

Zhou, Zilu; Wang, Weixin; Wang, Li-San et al. (2018) Integrative DNA copy number detection and genotyping from sequencing and array-based platforms. Bioinformatics 34:2349-2355

Xia, Li Charlie; Ai, Dongmei; Lee, Hojoon et al. (2018) SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. Gigascience 7:

Urrutia, Eugene; Chen, Hao; Zhou, Zilu et al. (2018) Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny. Bioinformatics 34:2126-2128

Wang, Jingshu; Huang, Mo; Torre, Eduardo et al. (2018) Gene expression distribution deconvolution in single-cell RNA sequencing. Proc Natl Acad Sci U S A 115:E6437-E6446

Zhang, Hanrui; Zhang, Nancy R; Li, Mingyao et al. (2018) First Giant Steps Toward a Cell Atlas of Atherosclerosis. Circ Res 122:1632-1634

Huang, Mo; Wang, Jingshu; Torre, Eduardo et al. (2018) SAVER: gene expression recovery for single-cell RNA sequencing. Nat Methods 15:539-542

Ai, Dongmei; Huang, Ruocheng; Wen, Jin et al. (2017) Integrated metagenomic data analysis demonstrates that a loss of diversity in oral microbiota is associated with periodontitis. BMC Genomics 18:1041

Chen, Hao; Jiang, Yuchao; Maxwell, Kara N et al. (2017) ALLELE-SPECIFIC COPY NUMBER ESTIMATION BY WHOLE EXOME SEQUENCING. Ann Appl Stat 11:1169-1192

Jiang, Yuchao; Zhang, Nancy R; Li, Mingyao (2017) SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol 18:74

Lau, Billy T; Ji, Hanlee P (2017) Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes. BMC Genomics 18:745

Showing the most recent 10 out of 38 publications

Comments

Be the first to comment on Nancy Zhang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: