Single cell RNA sequencing has emerged as a powerful tool in genomics and has been used in a wide variety of applications, providing unprecedented insights into many basic biological questions that are previously difficult to address. However, analyzing scRNAseq data face important statistical and computational challenges that require the development of new computational and statistical methods. Key challenges include: (1) lack of robust statistical methods that can control for hidden confounding effects in a range of settings; (2) lack of accurate cell subpopulation clustering methods that are tailored to scRNAseq studies; and (3) difficulty in identifying functional genetic variations with scRNAseq alone and difficulty in integrating scRNAseq with other genetic studies include genome-wide association studies. Our proposed methods will address these challenges and are innovative in the following aspects: (1) our method for controlling for hidden confounding effects bridges between two existing classes of statistical methods for removing confounding effects and is thus expected to perform robustly across a range of scenarios; (2) our method for clustering cell subpopulations extracts clustering information from a lowdimensional representation of scRNAseq data and is thus expected to produce accurate results even when the original high-dimensional gene expression matrix is noisy; and (3) our method for identifying allele specific/biased expression using scRNAseq data alone represents the first such attempt and our method for integrating scRNAseq with GWASs also represents the first such attempt. All our proposed methods are tailored to scRNAseq data and will cope with the complexities and unique features of scRNAseq data, including, but not limited to, low-coverage, count nature, and drop-out events. We will develop, distribute, and support user-friendly open-source software implementing our methods to benefit the genomics and statistics community. The statistical methods developed here will pave ways for developing similar methods to other sequencing studies including bisulfite sequencing and ATAC-seq studies. The proposed methods are essential for understanding the heterogeneity of tissue compositions and the genetic architecture of complex traits and diseases - both are questions of central importance to human health.
Single cell RNA sequencing has emerged as a powerful tool in genomics. The technology is essential for understanding the heterogeneity of tissue compositions and the genetic architecture of complex traits and diseases. This project aims to develop new methods to address statistical and computational challenges in scRNAseq data analysis and facilitate the usage of scRNAseq technology.
|Zeng, Ping; Hao, Xingjie; Zhou, Xiang (2018) Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models. Bioinformatics 34:2797-2807|
|Chen, Mengjie; Zhou, Xiang (2018) VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies. Genome Biol 19:196|
|Hao, Xingjie; Zeng, Ping; Zhang, Shujun et al. (2018) Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies. PLoS Genet 14:e1007186|
|Zhou, Xiang (2017) A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 11:2027-2051|
|Chen, Mengjie; Zhou, Xiang (2017) Controlling for Confounding Effects in Single Cell RNA Sequencing Studies Using both Control and Target Genes. Sci Rep 7:13587|
|Yang, Jingjing; Fritsche, Lars G; Zhou, Xiang et al. (2017) A Scalable Bayesian Method for Integrating Functional Information in Genome-wide Association Studies. Am J Hum Genet 101:404-416|