Genomic and Cellular Variation from Single Molecules to Single Cells

Zhang, Nancy; Ji, Hanlee

Abstract

Defining the features of cellular mixtures, where diverse cell types with distinct genomic characteristics are physically intermingled together, is a central problem in biology. For example, diseases such as cancer are characterized by cellular masses comprised of subpopulations, each with its own set of genetic variants and transcriptional signatures, where inter-population DNA variation is compounded with cell-to-cell RNA expression stochasticity. Characterizing genomic diversity in cellular mixtures and assessing its impact on cell-to-cell gene expression variation require analyses at the resolution of individual cells and contiguous genome molecules. This level of analytical resolution is now feasible with next generation sequencing (NGS) assays that integrate molecular barcoding with single-cell RNA sequencing and single molecule DNA sequencing. These technological advances surmount key challenges and herald new opportunities for the study of disease, but require new analysis methods: (1) Current NGS methods are not optimal for detecting and phasing genomic variants from cellular mixtures. For example, it is difficult to detect complex structural variants (SVs) that are carried by only a fraction of the genomes present within a mixture. Methods based on short read data is hindered by the loss of long range contiguity in heavily fragmented DNA as well as the low mappability of many SV junctions. Single-molecule linked-read DNA sequencing overcomes these drawbacks, but is in need of reliable analysis methods. (2) Single-cell RNA sequencing allows the detection of distinct cellular subpopulations with unique transcriptional signatures, however, data from individual cell transcriptomes have high levels of error and bias. New analysis procedures are needed to make statistically sound inferences. (3) The existing methods for single-cell expression analysis typically ignore DNA heterogeneity, which can be crucial for some studies, especially for cancer. It is yet unclear how to simultaneously characterize variation at both the DNA and RNA levels in a cellular mixture. This proposal addresses these issues by developing new statistical methods and experimental designs that enable accurate characterization of cellular mixtures exhibiting both DNA and RNA variations. We propose to develop methods to (1) detect, characterize, and phase complex variants using new single-molecule sequencing technology, (2) improve expression estimates obtained from single-cell RNA sequencing data, and (3) combine bulk single-molecule DNA sequencing and single-cell RNA sequencing to quantify the relationship between DNA variation and transcriptomic variation in genetically heterogeneous samples such as cancer.

Public Health Relevance

This application provides statistical and computational tools for analysis of single-cell and single molecule sequencing data, which allows the more accurate profiling of genomic and cellular heterogeneity within disease tissues such as tumors. More accurate tissue profiling forms the foundation for more accurate disease prognosis and more effective treatment.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG006137-09
Application #: 9729025
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Brooks, Lisa

Project Start: 2011-07-06
Project End: 2020-06-30
Budget Start: 2019-07-01
Budget End: 2020-06-30
Support Year: 9
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: University of Pennsylvania
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2020 R01 HG	Single Cell Transcriptomic and Genetic Diversity by Single Molecule Long Read Sequencing Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2019 R01 HG	Genomic and Cellular Variation from Single Molecules to Single Cells Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2018 R01 HG	Genomic and Cellular Variation from Single Molecules to Single Cells Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2017 R01 HG	Genomic and Cellular Variation from Single Molecules to Single Cells Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2016 R01 HG	Statistical Models and Analysis of Complex Genomic Variation in Clonal Mixtures Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2015 R01 HG	Statistical Models and Analysis of Complex Genomic Variation in Clonal Mixtures Zhang, Nancy R.; Ji, Hanlee P. / University of Pennsylvania
NIH 2014 R01 HG	Statistical Models and Analysis of Complex Genomic Variation in Clonal Mixtures Zhang, Nancy R.; Ji, Hanlee / University of Pennsylvania	$258,774
NIH 2013 R01 HG	Statistical Models for Genome Sequencing and Association Ji, Hanlee; Zhang, Nancy R. / Stanford University	$215,645
NIH 2012 R01 HG	Statistical Models for Genome Sequencing and Association Ji, Hanlee; Zhang, Nancy R. / Stanford University	$215,498
NIH 2011 R01 HG	Statistical Models for Genome Sequencing and Association Zhang, Nancy R.; Ji, Hanlee / Stanford University	$216,462

Publications

Zhou, Zilu; Wang, Weixin; Wang, Li-San et al. (2018) Integrative DNA copy number detection and genotyping from sequencing and array-based platforms. Bioinformatics 34:2349-2355

Xia, Li Charlie; Ai, Dongmei; Lee, Hojoon et al. (2018) SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. Gigascience 7:

Urrutia, Eugene; Chen, Hao; Zhou, Zilu et al. (2018) Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny. Bioinformatics 34:2126-2128

Wang, Jingshu; Huang, Mo; Torre, Eduardo et al. (2018) Gene expression distribution deconvolution in single-cell RNA sequencing. Proc Natl Acad Sci U S A 115:E6437-E6446

Zhang, Hanrui; Zhang, Nancy R; Li, Mingyao et al. (2018) First Giant Steps Toward a Cell Atlas of Atherosclerosis. Circ Res 122:1632-1634

Huang, Mo; Wang, Jingshu; Torre, Eduardo et al. (2018) SAVER: gene expression recovery for single-cell RNA sequencing. Nat Methods 15:539-542

Ai, Dongmei; Huang, Ruocheng; Wen, Jin et al. (2017) Integrated metagenomic data analysis demonstrates that a loss of diversity in oral microbiota is associated with periodontitis. BMC Genomics 18:1041

Chen, Hao; Jiang, Yuchao; Maxwell, Kara N et al. (2017) ALLELE-SPECIFIC COPY NUMBER ESTIMATION BY WHOLE EXOME SEQUENCING. Ann Appl Stat 11:1169-1192

Jiang, Yuchao; Zhang, Nancy R; Li, Mingyao (2017) SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol 18:74

Lau, Billy T; Ji, Hanlee P (2017) Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes. BMC Genomics 18:745

Showing the most recent 10 out of 38 publications

Comments

Be the first to comment on Nancy Zhang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: