The gene expression values measured on a tissue sample is the, ensemble expression of all its comprising cells. Studying gene expression derived from tissue samples is COl)lplicated by the fact that there exist heterogeneous cell types in the tissue. For a full exploration of how different cell types interactively impact the development of a tissue or a disease, and more generally, robust mechanism interpretations from tissue-based transcriptomic data analysis, it is important to decipher the observed tissue-level gene expressions to the combination of expression levels of its cell components. A gene's expression in an individual cell is regulated by ,a set of transcriptional regulatory signals (TRSs) such as transcription factors, miRNAs, lncRNA, and epigenomic regulators. Oeciphering cell-type specific expression contribution is equivalent to identifying the true cell-type specific TRSs in different cell components of a tissue sample. Considering that the highly diverse TRS types in mammalian cells cannot be simultaneously measured by current experimental methods, we will model and quantify cell-type specific TRSs via mathematically well-defined co-regulation modules of their regulated genes based on single-cell RNA-Seq data. We hypothesize that the genes co-regulated by a common TRS in multiple cells can be characterized in single-cell RNA-Seq data and form gene signatures of the TRS. Mathematically, such a problem can be formulated as detection of a submatrix in a single-cell expression malrix, where the genes share coherent expression patterns over certain single-cell samples. Our preliminary data demonstrated that this problem can be solved by a biclustering based local low-rank submatrix detection approach. In this project, we propose the development of a computational infrastructure to derive gene signatures of cell-type specific TRSs from single-cell RNA-Seq data and decompose a tissue transcriptomic data to the contributions of TRSs in its component cells. Specifically, we have the following three aims: (1) Mathematically model TRS and associated co-regulation gene modules through transcriptomic profiles of single cells; (2) Develop a novel bi-clustering algorithm for identifying condition/cell-type specific co-regulated gene modules in single-cell transcriptomic data; and (3) Identify and annotate the gene signatures for each TRS, and estimate the level of each TRS in independent tissue data. Recent studies revealed the crucial impact of stromal and immune cells on the progression and metastasis of cancer. We will apply the computational methods to TCGA tissue expression and single-cell expression data from other sources, to quantitatively estimate the level of cell type-specific TRSs for different cell types within a cancer tissue. All the developed computational tools and derived knowledge will be maintained into/ a web server/database for public utilization.

Public Health Relevance

A major challenge in utilizing tissue transcriptomics data is to tackle the co-existence of phenotypically and functionally distinct cell populations. In this project, we will develop a computational infrastructure to derive gene expression signatures of cell-type specific transcriptional regulatory signals using single-cell RNA-seq data. The gene expression signatures can then be applied to decompose a tissue transcriptomic data to the combination of expression levels of its cell components.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Resat, Haluk
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
South Dakota State University
Other Basic Sciences
Earth Sciences/Resources
United States
Zip Code