Single-cell RNA-sequencing (scRNA-seq) has emerged very recently as a powerful technology to investigate transcriptomic variation and regulation at the individual cell level. Traditional bulk RNA-seq pools RNA from a large number of cells and measures the averaged expressions in a sample. In contrast, scRNA-seq reveals cell to cell heterogeneity, providing critical information to the understanding of biological processes in development, differentiation, and disease etiologies. This new technology leads to an expansion of applications in both basic and clinical research, but also brings challenges in analysis with its unique data characteristics. These include: 1) difficulty in estimating molecule counts with the presence of technical artifacts, due to small amount of starting material and additional sample preparation procedures; 2) lack of appropriate methods for functional clustering for single cell RNA count data, which are much sparser than bulk RNA-seq; 3) lack of a quantitative measure and comparison of heterogeneity. We propose to address these challenges by developing a series of novel statistical methods for scRNA-seq data preprocessing and analyses. This includes removing technical bias in RNA capture and amplification to obtain accurate molecule level counts, identifying functional types/ subtypes of cells and interpretable feature groups, explaining heterogeneities between samples and cells, and identifying differential heterogeneity. All methods developed in this project will be implemented and released as free, open source software to benefit the genomics research community. The probability model and statistical framework established in this proposal will lay a foundation for future methodology development for other single cell sequencing experiments such as single-cell ATAC-seq or BS-seq.

Public Health Relevance

The regulation of gene expression plays a vital role in human health. Single-cell RNA-sequencing (scRNA-seq) is a new technology to characterize expression variation at individual cell level. It presents a promising direction to a better understanding of disease etiology, and leads to new drug targets and strategies for personalized treatment. This project will produce novel statistical methods for scRNA-seq data preprocessing and analyses which will enable more efficient and accurate analysis of scRNA-seq data.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM122083-02
Application #
9332246
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Brazhnik, Paul
Project Start
2016-08-15
Project End
2021-07-31
Budget Start
2017-08-01
Budget End
2018-07-31
Support Year
2
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Emory University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
066469933
City
Atlanta
State
GA
Country
United States
Zip Code
30322
Yao, Bing; Li, Yujing; Wang, Zhiqin et al. (2018) Active N6-Methyladenine Demethylation by DMAD Regulates Gene Expression by Coordinating with Polycomb Protein in Neurons. Mol Cell 71:848-857.e6
Cheng, Ying; Li, Ziyi; Manupipatpong, Sasicha et al. (2018) 5-Hydroxymethylcytosine alterations in the human postmortem brains of autism spectrum disorder. Hum Mol Genet 27:2955-2964
Xu, Tianlei; Zheng, Xiaoqi; Li, Ben et al. (2018) A comprehensive review of computational prediction of genome-wide features. Brief Bioinform :
Feng, Hao; Jin, Peng; Wu, Hao (2018) Disease prediction by cell-free DNA methylation. Brief Bioinform :
Wu, Zhijin; Zhang, Yi; Stitzel, Michael L et al. (2018) Two-phase differential expression analysis for single cell RNA-seq. Bioinformatics 34:3340-3348
Zhang, Feiran; Kang, Yunhee; Wang, Mengli et al. (2018) Fragile X mental retardation protein modulates the stability of its m6A-marked messenger RNA targets. Hum Mol Genet 27:3936-3950
Hong, Chuan; Ning, Yang; Wang, Shuang et al. (2017) PLEMT: A NOVEL PSEUDOLIKELIHOOD BASED EM TEST FOR HOMOGENEITY IN GENERALIZED EXPONENTIAL TILT MIXTURE MODELS. J Am Stat Assoc 112:1393-1404
Liao, Peizhou; Wu, Hao; Yu, Tianwei (2017) ROC Curve Analysis in the Presence of Imperfect Reference Standards. Stat Biosci 9:91-104
Zhang, Weiwei; Feng, Hao; Wu, Hao et al. (2017) Accounting for tumor purity improves cancer subtype classification from DNA methylation data. Bioinformatics 33:2651-2657
Zheng, Xiaoqi; Zhang, Naiqian; Wu, Hua-Jun et al. (2017) Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies. Genome Biol 18:17

Showing the most recent 10 out of 11 publications