Defining the features of cellular mixtures, where diverse cell types with distinct genomic characteristics are physically intermingled together, is a central problem in biology. During the past decade, single cell sequencing technologies have enabled a new era of high throughput and high resolution interrogation of cell type diversity, vastly expanding our understanding of the role that cell types play in development and disease. Yet, current studies in single cell genomics rely on short-read sequencing and thus suffer from limitations, including: (1) Most studies rely on short read counting which limits the study of alternative splicing. (2) Cell states are reflected by static snapshots, and while population dynamics can be deduced through trajectory and RNA velocity estimation, robust estimation of these parameters remains a major challenge. (3) Despite advances in single-cell DNA sequencing, there is yet no cost-effective way to simultaneously characterize both the genetic variants and transcriptome-level changes in a cell, which is crucial for diseases such as cancer. This proposal is motivated by technological breakthroughs in single-molecule sequencing (SMS) and the recent adaptation of SMS to the massively parallel sequencing of single cell transcriptomes in our lab. We propose to develop computational methods to harness the power of SMS in single cell transcriptomics. In particular, we have developed a new genomic approach which allows one to repeatedly interrogate complete transcripts from single cells using SMS long reads, rather than 3' or 5' counting with short reads. This technology allows experimental designs where specific transcript subsets and/or cellular subsets can be repeatedly targeted for deeper joint short and long read analysis over many iterations, which we will exploit to conduct analyses that were previously intractable.

Public Health Relevance

During the past decade, single cell sequencing technologies have enabled a new era of high throughput and high resolution interrogation of cell type diversity, vastly expanding our understanding of the role that cell types play in development and disease. This proposal is motivated by technological breakthroughs in single-molecule sequencing (SMS) and the recent adaptation of SMS to the massively parallel sequencing of single cell transcriptomes. We propose to develop computational methods to harness the power of SMS in single cell transcriptomics, thus improving the analysis disease-related changes in individual genomes and cells.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG006137-10
Application #
10050892
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Gilchrist, Daniel A
Project Start
2011-07-06
Project End
2023-06-30
Budget Start
2020-09-01
Budget End
2021-06-30
Support Year
10
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Zhang, Hanrui; Zhang, Nancy R; Li, Mingyao et al. (2018) First Giant Steps Toward a Cell Atlas of Atherosclerosis. Circ Res 122:1632-1634
Huang, Mo; Wang, Jingshu; Torre, Eduardo et al. (2018) SAVER: gene expression recovery for single-cell RNA sequencing. Nat Methods 15:539-542
Zhou, Zilu; Wang, Weixin; Wang, Li-San et al. (2018) Integrative DNA copy number detection and genotyping from sequencing and array-based platforms. Bioinformatics 34:2349-2355
Xia, Li Charlie; Ai, Dongmei; Lee, Hojoon et al. (2018) SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. Gigascience 7:
Urrutia, Eugene; Chen, Hao; Zhou, Zilu et al. (2018) Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny. Bioinformatics 34:2126-2128
Wang, Jingshu; Huang, Mo; Torre, Eduardo et al. (2018) Gene expression distribution deconvolution in single-cell RNA sequencing. Proc Natl Acad Sci U S A 115:E6437-E6446
Shin, GiWon; Grimes, Susan M; Lee, HoJoon et al. (2017) CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis. Nat Commun 8:14291
Greer, Stephanie U; Nadauld, Lincoln D; Lau, Billy T et al. (2017) Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome Med 9:57
Ai, Dongmei; Huang, Ruocheng; Wen, Jin et al. (2017) Integrated metagenomic data analysis demonstrates that a loss of diversity in oral microbiota is associated with periodontitis. BMC Genomics 18:1041
Chen, Hao; Jiang, Yuchao; Maxwell, Kara N et al. (2017) ALLELE-SPECIFIC COPY NUMBER ESTIMATION BY WHOLE EXOME SEQUENCING. Ann Appl Stat 11:1169-1192

Showing the most recent 10 out of 38 publications