Both healthy and diseased tissues are composed of multiple cell types whose interplay underpins their functions. Even within a given cell type, individual cells will differ in their exact state due to distinct external influences, cell's history, or random noise. Such heterogeneity presents a major challenge for modeling and understanding biological tissues. Rapidly progressing single-cell measurements can now provide comprehensive data on cell state, enabling unbiased analysis of composition of complex tissues. Such measurements, however, are inherently noisy and require specialized statistical and computational tools for their analysis. The project will develop sensitive statistical methods for identification and characterization of biologically distinct subsets of cells from single-cell transcriptome data, and will apply them to investigate cellular composition and function of neural tissues in humans, mice and other organisms. Such characterization will likely provide valuable insights into the mechanisms underlying brain development. The developed tools will be widely applicable in other biological contexts. Finally, the instructional material created by this project will introduce beginning and advanced students to the fundamental set of statistical and computational techniques necessary to understand modern biological measurements.
The research will develop an approach for analysis cell heterogeneity based on the single-cell transcriptome data. Model-based factor analysis will be used to capture the structure of the transcriptional heterogeneity within the measured cell populations in a way that is highly tolerant of technical and biological variability inherent to the single-cell measurements. Methods for incorporating available spatial information and predicting spatial localization of subpopulations will be developed. Statistical methods will be developed to identify gene regulatory dependencies from single-cell data. The approaches will be applied to analysis of transcriptional heterogeneity and regulatory processes in neuronal tissues of humans and model organisms. To facilitate teaching of relevant analysis methods, the project will develop a series of interactive exercises, which will illustrate common counting processes that underlie the assumptions of most analysis methods, statistical tools for estimating uncertainty from count data, and common algorithms used to process single-cell sequencing data. To reach wider student audience the interactive instructional tools will have an adaptable difficulty level and will be directly accessible over the web. More information and ongoing results of this project will be posted at: http://pklab.med.harvard.edu/peterk/nsf/CAREER.html