Single-cell RNA-sequencing (scRNA-seq) has emerged very recently as a powerful technology to investigate transcriptomic variation and regulation at the individual cell level. Traditional bulk RNA-seq pools RNA from a large number of cells and measures the averaged expressions in a sample. In contrast, scRNA-seq reveals cell to cell heterogeneity, providing critical information to the understanding of biological processes in development, differentiation, and disease etiologies. This new technology leads to an expansion of applications in both basic and clinical research, but also brings challenges in analysis with its unique data characteristics. These include: 1) difficulty in estimating molecule counts with the presence of technical artifacts, due to small amount of starting material and additional sample preparation procedures; 2) lack of appropriate methods for functional clustering for single cell RNA count data, which are much sparser than bulk RNA-seq; 3) lack of a quantitative measure and comparison of heterogeneity. We propose to address these challenges by developing a series of novel statistical methods for scRNA-seq data preprocessing and analyses. This includes removing technical bias in RNA capture and amplification to obtain accurate molecule level counts, identifying functional types/ subtypes of cells and interpretable feature groups, explaining heterogeneities between samples and cells, and identifying differential heterogeneity. All methods developed in this project will be implemented and released as free, open source software to benefit the genomics research community. The probability model and statistical framework established in this proposal will lay a foundation for future methodology development for other single cell sequencing experiments such as single-cell ATAC-seq or BS-seq.
The regulation of gene expression plays a vital role in human health. Single-cell RNA-sequencing (scRNA-seq) is a new technology to characterize expression variation at individual cell level. It presents a promising direction to a better understanding of disease etiology, and leads to new drug targets and strategies for personalized treatment. This project will produce novel statistical methods for scRNA-seq data preprocessing and analyses which will enable more efficient and accurate analysis of scRNA-seq data.
Showing the most recent 10 out of 11 publications