Single cell RNA sequencing has emerged as a powerful tool in genomics and has been used in a wide variety of applications, providing unprecedented insights into many basic biological questions that are previously difficult to address. However, analyzing scRNAseq data face important statistical and computational challenges that require the development of new computational and statistical methods. Key challenges include: (1) lack of robust statistical methods that can control for hidden confounding effects in a range of settings; (2) lack of accurate cell subpopulation clustering methods that are tailored to scRNAseq studies; and (3) difficulty in identifying functional genetic variations with scRNAseq alone and difficulty in integrating scRNAseq with other genetic studies include genome-wide association studies. Our proposed methods will address these challenges and are innovative in the following aspects: (1) our method for controlling for hidden confounding effects bridges between two existing classes of statistical methods for removing confounding effects and is thus expected to perform robustly across a range of scenarios; (2) our method for clustering cell subpopulations extracts clustering information from a lowdimensional representation of scRNAseq data and is thus expected to produce accurate results even when the original high-dimensional gene expression matrix is noisy; and (3) our method for identifying allele specific/biased expression using scRNAseq data alone represents the first such attempt and our method for integrating scRNAseq with GWASs also represents the first such attempt. All our proposed methods are tailored to scRNAseq data and will cope with the complexities and unique features of scRNAseq data, including, but not limited to, low-coverage, count nature, and drop-out events. We will develop, distribute, and support user-friendly open-source software implementing our methods to benefit the genomics and statistics community. The statistical methods developed here will pave ways for developing similar methods to other sequencing studies including bisulfite sequencing and ATAC-seq studies. The proposed methods are essential for understanding the heterogeneity of tissue compositions and the genetic architecture of complex traits and diseases - both are questions of central importance to human health.

Public Health Relevance

Single cell RNA sequencing has emerged as a powerful tool in genomics. The technology is essential for understanding the heterogeneity of tissue compositions and the genetic architecture of complex traits and diseases. This project aims to develop new methods to address statistical and computational challenges in scRNAseq data analysis and facilitate the usage of scRNAseq technology.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM126553-03
Application #
9700701
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Brazhnik, Paul
Project Start
2017-08-01
Project End
2022-05-31
Budget Start
2019-06-01
Budget End
2020-05-31
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Chicago
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
005421136
City
Chicago
State
IL
Country
United States
Zip Code
60637