Single-cell RNA-sequencing (scRNA-seq) has emerged very recently as a powerful technology to investigate transcriptomic variation and regulation at the individual cell level. Traditional bulk RNA-seq pools RNA from a large number of cells and measures the averaged expressions in a sample. In contrast, scRNA-seq reveals cell to cell heterogeneity, providing critical information to the understanding of biological processes in development, differentiation, and disease etiologies. This new technology leads to an expansion of applications in both basic and clinical research, but also brings challenges in analysis with its unique data characteristics. These include: 1) difficulty in estimating molecule counts with the presence of technical artifacts, due to small amount of starting material and additional sample preparation procedures; 2) lack of appropriate methods for functional clustering for single cell RNA count data, which are much sparser than bulk RNA-seq; 3) lack of a quantitative measure and comparison of heterogeneity. We propose to address these challenges by developing a series of novel statistical methods for scRNA-seq data preprocessing and analyses. This includes removing technical bias in RNA capture and amplification to obtain accurate molecule level counts, identifying functional types/ subtypes of cells and interpretable feature groups, explaining heterogeneities between samples and cells, and identifying differential heterogeneity. All methods developed in this project will be implemented and released as free, open source software to benefit the genomics research community. The probability model and statistical framework established in this proposal will lay a foundation for future methodology development for other single cell sequencing experiments such as single-cell ATAC-seq or BS-seq.
The regulation of gene expression plays a vital role in human health. Single-cell RNA-sequencing (scRNA-seq) is a new technology to characterize expression variation at individual cell level. It presents a promising direction to a better understanding of disease etiology, and leads to new drug targets and strategies for personalized treatment. This project will produce novel statistical methods for scRNA-seq data preprocessing and analyses which will enable more efficient and accurate analysis of scRNA-seq data.
|Yao, Bing; Li, Yujing; Wang, Zhiqin et al. (2018) Active N6-Methyladenine Demethylation by DMAD Regulates Gene Expression by Coordinating with Polycomb Protein in Neurons. Mol Cell 71:848-857.e6|
|Cheng, Ying; Li, Ziyi; Manupipatpong, Sasicha et al. (2018) 5-Hydroxymethylcytosine alterations in the human postmortem brains of autism spectrum disorder. Hum Mol Genet 27:2955-2964|
|Xu, Tianlei; Zheng, Xiaoqi; Li, Ben et al. (2018) A comprehensive review of computational prediction of genome-wide features. Brief Bioinform :|
|Feng, Hao; Jin, Peng; Wu, Hao (2018) Disease prediction by cell-free DNA methylation. Brief Bioinform :|
|Wu, Zhijin; Zhang, Yi; Stitzel, Michael L et al. (2018) Two-phase differential expression analysis for single cell RNA-seq. Bioinformatics 34:3340-3348|
|Zhang, Feiran; Kang, Yunhee; Wang, Mengli et al. (2018) Fragile X mental retardation protein modulates the stability of its m6A-marked messenger RNA targets. Hum Mol Genet 27:3936-3950|
|Hong, Chuan; Ning, Yang; Wang, Shuang et al. (2017) PLEMT: A NOVEL PSEUDOLIKELIHOOD BASED EM TEST FOR HOMOGENEITY IN GENERALIZED EXPONENTIAL TILT MIXTURE MODELS. J Am Stat Assoc 112:1393-1404|
|Liao, Peizhou; Wu, Hao; Yu, Tianwei (2017) ROC Curve Analysis in the Presence of Imperfect Reference Standards. Stat Biosci 9:91-104|
|Zhang, Weiwei; Feng, Hao; Wu, Hao et al. (2017) Accounting for tumor purity improves cancer subtype classification from DNA methylation data. Bioinformatics 33:2651-2657|
|Zheng, Xiaoqi; Zhang, Naiqian; Wu, Hua-Jun et al. (2017) Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies. Genome Biol 18:17|
Showing the most recent 10 out of 11 publications