Cells are the basic biological units of multicellular organisms. Recent technological breakthroughs have made it possible to measure gene expression at the single-cell level, thus paving the way for exploring gene expression heterogeneity among cells. The collection of abundances of all RNA species in a cell forms its ?molecular fingerprint?, enabling the investigation of many fundamental biological questions beyond those possible by traditional bulk RNA-seq experiments. Single-cell RNA-seq (scRNA-seq) allows us to better describe the lineage and type of single cells, characterize the stochasticity of gene expression across cells, and improve our understanding of cellular function in health and disease. ScRNA-seq analysis is transforming biomedical sciences, and has already made great impact in fields such as neuroscience and immunology, and can enhance our understanding of disease development in numerous other contexts including cardiometabolic diseases. However, scRNA-seq data present new challenges for which standard analytical methods are not designed to confront. Current scRNA-seq protocols are complex, often introducing technical biases that vary across cells, which, if not properly removed, can obscure cell type identification and lead to biased results in downstream analyses. Published scRNA-seq studies have mainly been proof-of-principal studies illustrating the utility of scRNA-seq in cell type classification and other basic biological analyses. However, as the use of scRNA-seq continues to grow, researchers are beginning to explore their utility in disease gene discovery. Building upon our expertise in statistical methods development and our experience with analysis of genomics data for human cardiometabolic diseases, in this proposal, we propose to develop novel statistical methods to address some of the key analytical challenges in scRNA-seq analysis. We will guide methods development through the analysis of scRNA-seq data generated from ongoing collaborations with collaborators at the University of Pennsylvania and Columbia University. We propose the following specific aims.
Aim 1 : Develop methods to recover gene expression and identify cell types.
Aim 2 : Develop methods to detect gene expression changes between cell types or conditions.
Aim 3 : Develop methods to estimate isoform-specific gene expression and detect differential alternative splicing.
Aim 4 : Develop methods to model allele-specific transcriptional bursting and its genetic regulation. This proposal addresses critical challenges in scRNA-seq analysis, and it brings together an exceptional team of scientists with proven track record in statistical genomics, single-cell biology, and cardiometabolic disease. The successful completion of this project will allow researchers to better disentangle complex cellular heterogeneity, precisely relate genomic sequence to gene regulation, and facilitate the translation of basic research findings into clinical studies of human disease.

Public Health Relevance

Cells are the basic biological units of multicellular organisms. The collection of abundances of all RNA species in a cell forms its ?molecular fingerprint?, enabling the investigation of many fundamental biological questions beyond those possible by traditional bulk RNA sequencing experiments. This proposal addresses critical statistical challenges in single-cell RNA sequencing analysis. The successful completion of this project will allow researchers to better disentangle complex cellular heterogeneity, precisely relate genomic sequence to gene regulation, and facilitate the translation of basic research findings into clinical studies of human disease.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM125301-01
Application #
9402782
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Resat, Haluk
Project Start
2017-09-01
Project End
2021-08-31
Budget Start
2017-09-01
Budget End
2018-08-31
Support Year
1
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Zhang, Hanrui; Zhang, Nancy R; Li, Mingyao et al. (2018) First Giant Steps Toward a Cell Atlas of Atherosclerosis. Circ Res 122:1632-1634
Huang, Mo; Wang, Jingshu; Torre, Eduardo et al. (2018) SAVER: gene expression recovery for single-cell RNA sequencing. Nat Methods 15:539-542
Wang, Jingshu; Huang, Mo; Torre, Eduardo et al. (2018) Gene expression distribution deconvolution in single-cell RNA sequencing. Proc Natl Acad Sci U S A 115:E6437-E6446