Many projects have characterized the human brain transcriptome within and across cell types to better understand changes in RNA expression associated with brain development and aging, developmental or psychiatric brain disorders, and genetic variation. Large consortia, including psychENCODE, CommonMind and BrainSeq Consortiums, have primarily focused on the molecular profiling of RNA extracted from homogenate/bulk tissue from different brain regions across thousands of individuals. However, bulk tissue like the frontal cortex contains a mixture of different important cell populations, and failing to account for the underlying composition of tissue samples can cause both false positives and missed signal in differential expression analysis. Therefore, statistical methods referred to as cellular deconvolution have been developed that estimate the relative fractions of different cell types in bulk RNA-seq datasets. These cell fractions can then be used to control for differences in cell composition across bulk tissue samples and can better determine the cell type(s) that drive differential expression signal in bulk tissue data. However, these approaches require reference expression profiles from the underlying cell types that will be estimated, which can be difficult to generate from human postmortem brain tissue. Recent approaches have leveraged single cell RNA sequencing (scRNA-seq) or single nuclei RNA sequencing (snRNA-seq) datasets to construct these reference profiles and perform cellular deconvolution, particularly in peripheral tissues. While many statistical or machine learning approaches have been proposed, the majority produce similar composition estimates for a given reference dataset. However, as we describe in this application, many of these existing reference datasets -regardless of the algorithm employed - are largely non- comparable to the vast majority of bulk RNA sequencing data generated from postmortem human brain tissue, and have produced incorrect estimates of cellular composition. Current algorithms estimate the relative fraction of RNA attributable to each cell type, and not the relative fraction of cell types. We therefore propose to generate a more comprehensive framework for performing cellular deconvolution in human postmortem RNA- seq data.This proposal will leverage the extensive bulk RNA sequencing performed over the past decade to better determine the relative role of cell type-specific expression in the human brain and their subsequent dysregulation in debilitating brain disorders.

Public Health Relevance

Recent approaches have leveraged single cell or nuclei RNA sequencing (RNA-seq) data to perform cellular deconvolution on homogenate datasets, but many current algorithms estimate the relative fraction of RNA attributable to each cell type, and not the relative fraction of cell types. This proposal therefore aims to develop novel reference and validation datasets necessary to implement cellular, and not RNA, deconvolution from RNA-seq datasets, which will become publicly available software. We will lastly apply this novel deconvolution approach to thousands of homogenate RNA-seq samples to identify cell type-specific insights into debilitating brain disorders.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZMH1)
Program Officer
Arguello, Alexander
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Lieber Institute, Inc.
United States
Zip Code