The recent introduction of single-cell RNA-sequencing (scRNA-seq) has revolutionized research in the biological sciences by revealing the individual genome-wide gene expression response levels, i.e., transcriptomes, cell by cell. For the first time, researchers are able to evaluate and compare the transcriptomes of individual cells, for instance from two cells in the same tissue but different microenvironments, or two neurons in different developmental states, of to compare a normal cell and one that is undergoing a degenerative process. However, the technology still has limitations that restrict its quantitative power. Researchers face an experimental trade-off between exploring either fewer cells for higher accuracy or a greater number of cells for a broader survey of gene expression. Further, scRNA-seq technology is still so new that a variety of experimental protocols exist that are subject to different bias and errors, presenting a hurdle for data validation, cross-referencing between labs, and normalization and integration of data from public repositories. This project will enhance biological research across the many disciplines using this type of assay, by advancing scRNA-seq data analysis and providing new critical tools for investigating molecular mechanisms underlying particular states, including disease states like cancers and neurological disorders. As scRNA-seq technology is still new, the project plans to be on the frontier of education and method development, disseminating information to all levels of trainees in statistics and biology. Teaching activities will capitalize on the excitement of individual cell analysis, through scRNA-seq data, to heighten undergraduate students' understanding of statistical analysis and to attract underrepresented minority students to study quantitative sciences.

This project will establish a necessary computational infrastructure for the design of experiments and analysis of data that arise from scRNA-seq assays. A statistical and computational simulator will be developed to enable researchers to design more effective scRNA-seq experiments at a significantly lower cost. Another goal is to develop an scRNA-seq database that organizes individual cell transcriptomes in a hierarchical taxonomy of cell types, providing a benchmark resource for computational method development. Assisted by the simulator and the database, a new suite of statistical and computational methods will be developed to increase the resolution and accuracy of scRNA-seq data analysis. Those methods will serve as effective bioinformatic tools for researchers such that they may quantify genome-wide transcripts in individual cells, identify differentially expressed genes at a cell-subtype resolution, and compare the transcriptomes of individual human and mouse cells. The infrastructure and methods developed in this project will enable and expedite scientific discoveries from scRNA-seq data and will be applicable to both experimentalists and computationalists in the scRNA-seq field.

Results of this project, including research papers, software packages, and video tutorials, will be made available at http://jsb.ucla.edu.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1846216
Program Officer
Jean Gao
Project Start
Project End
Budget Start
2019-07-01
Budget End
2024-06-30
Support Year
Fiscal Year
2018
Total Cost
$237,956
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095