Single-cell RNA-Seq (scRNA-Seq) analyses have revolutionized the methods in which researchers can investigate tissue samples of specific cell types. While single-cell sequencing technologies have provided a new frontier for researchers, they also come with a complex set of problems. One of these problems is related to the quality of gene expression estimates, which are used in numerous downstream analyses from the prediction of the cell types/trajectories to determining differentially expressed genes between cell types or tissues. The low coverage and sequencing inefficiencies can affect up to 90% of gene expression estimates for scRNA-Seq studies, and hence, are challenging to overcome. However, there are two critical problems in the way that current methods attempt to address this problem: (1) inadequate use of bulk data to compensate for low expression genes and (2) under-utilization of iterative procedures to optimize highly-connected steps for imputation of gene expression estimates.

This project will develop a novel computational framework to integrate bulk RNA-seq data into scRNA-seq data modeling and analyses, aiming at accurate gene expression estimates from the sparse scRNA-Seq data, and high quality, reliability, and precision of downstream analyses. The aim is to model particular features of the heterogeneous gene expression patterns among various cell types. Integration of bulk RNA-Seq data through de-convolution will be used to develop heterogeneous compensation distributions and probabilities. Utilization of the gamma distribution to determine empirical distribution for single-cell gene expression estimates will improve the baseline expression in a specific cell type and identify estimates of interest through the high level of noise in sequencing data, which will then be combined with compensation information from bulk RNA-Seq data to correct biases from the high noise scRNA-Seq data. Finally, the updated expression estimate will be used to iterate back through the process to provide improved results for each stage of the process. The outcome will be a novel imputation framework that should enable scRNA-Seq expression estimates through the integration of the above three new characteristics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1945971
Program Officer
Jean Gao
Project Start
Project End
Budget Start
2021-03-01
Budget End
2023-02-28
Support Year
Fiscal Year
2019
Total Cost
$299,999
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210