Statistical analysis of single cell RNA-sequencing data: detecting dropouts and removing the cell-cycle effect

Li, Jun

Abstract

Single cell RNA-Sequencing (scRNA-Seq) is starting a revolution on cancer genomic research. This technique is able to measure gene expression of each individual cell, and hundreds to thousands of cells in a single experiment. Tumors are inherently heterogeneous, in which different types of cells cooperate with each other so that this deadly disease can invade, metastasize, and develop therapy resistance. ScRNA-Seq gives people unprecedented abilities to determine and describe these cell types, and thus uncover the mechanisms behind cancer. Ultimately, this insight will lead to better therapy against cancer. The tools of single cell sequencing also have direct translational applications in the clinic, in areas such as early detection, noninvasive monitoring, and guiding targeted therapy. However, the development of data analysis methods is seriously lagging, which may have led to dubious biological findings and can seriously obstruct future discoveries. ScRNA-Seq data show distinct features that can cause real problems. First and foremost, many genes (even moderately or highly expressed genes) have expression measurements zero, and many of these zeros are experimental artifacts. Second, the gene expression measurement can be heavily biased by the cell-cycle effect. Both features can cause serious difficulties in identifying and describing cell types and in many other applications of scRNA-Seq data. The goal of this proposal is to develop stand-alone methods that handle these features and deliver clearer and less biased data, which will then boost the power of subsequent statistical analyses and facilitate exciting biological discoveries. We propose Aim 1 to detect zeros that are experimental artifacts and infer their ?true? values, and Aim 2 to remove the cell-cycle effect. Moreover, we will develop user- friendly software to implement our algorithms and make them publicly available, which is our Aim 3. Our algorithms/software will output a much clearer and less biased data, on which, to answer any specific biological question of interest, cancer researchers can either use existing algorithms developed for bulk-based RNA-Seq data or microarray data, or develop new algorithms without special care for the several troublesome features of the raw scRNA-Seq data. This greatly reduces the load of data analysis and will accelerate biological/medical discoveries. We expect that our software serves as an essential pre-processing step for any application of scRNA-Seq data.

Public Health Relevance

Single-cell RNA-Sequencing gives researchers unprecedented abilities in determining and describing different cell types and their dynamics in tumor tissues, and thus uncovering the mechanisms behind cancer and finally leading to better therapy against cancer. In this proposal, we propose new strategies and statistical methods to clean up the very noisy and highly biased data generated by this pioneering technique. The cleaned data we deliver will greatly simplify follow-up data analysis and accelerate biological/medical discoveries.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Small Research Grants (R03)
Project #: 1R03CA212964-01
Application #: 9233410
Study Section: Special Emphasis Panel (ZCA1)
Program Officer: Li, Jerry

Project Start: 2017-09-01
Project End: 2019-08-31
Budget Start: 2017-09-01
Budget End: 2018-08-31
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: University of Notre Dame
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 824910376

City: Notre Dame
State: IN
Country: United States
Zip Code: 46556

Related projects


NIH 2018 R03 CA	Statistical analysis of single cell RNA-sequencing data: detecting dropouts and removing the cell-cycle effect Li, Jun / University of Notre Dame
NIH 2017 R03 CA	Statistical analysis of single cell RNA-sequencing data: detecting dropouts and removing the cell-cycle effect Li, Jun / University of Notre Dame

Publications

Barron, Martin; Zhang, Siyuan; Li, Jun (2018) A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data. Nucleic Acids Res 46:e14

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: