Single-cell genomic technologies such as single-cell RNA-seq have emerged as powerful techniques to quantify molecular states of individual cells and can be used to elucidate the cellular building blocks of complex tissues and diseases. Given recent rapid advances in single-cell technologies, novel statistical and computational approaches are needed to efficiently analyze large-scale single-cell datasets with multiple data types such as gene and protein expression. Discrete Bayesian hierarchical models have been widely used for unsupervised modeling of discrete data types in fields such as Nature Language Processing (NLP). We have developed a Bayesian hierarchical model called Cellular Latent Dirichlet Allocation (Celda) to perform bi-clustering of genes into modules and cells into subpopulations. We will develop novel models that can perform clustering of cells into subpopulations using multi-modal genomic data or clustering of patients into subgroups using both single- cell data and patient-level characteristics. These novel methods will be made available in a scalable and interpretable cloud-based framework accessible to both computational and non-computational users.
The aims of this study are to (1) develop novel models to perform integrative multi-modal and multi-level clustering with single-cell data, (2) develop an R package and cloud-based platform with a web interface for rapid inference and visualization of large-scale datasets, and (3) apply Celda models to single-cell datasets from a variety of biological settings including cancer, lung development, and immunology. Overall, these aims will be accomplished by an interdisciplinary team with strong expertise in computational biology and bioinformatics, biostatistics, computer science, and molecular and cellular biology.

Public Health Relevance

Single-cell genomic technologies have emerged as powerful techniques to quantify molecular states of individual cells and can be used to elucidate the cellular building blocks of complex tissues and diseases. We will develop novel discrete Bayesian hierarchical models that can cluster cells into subpopulations using multiple data types and cluster patients into subgroups using both single-cell data and patient-level characteristics. These novel methods will be made available in a scalable cloud-based framework accessible to both computational and non- computational users.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM013154-01
Application #
9801687
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2019-08-01
Project End
2022-07-31
Budget Start
2019-08-01
Budget End
2020-07-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Boston University
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
604483045
City
Boston
State
MA
Country
United States
Zip Code
02118