Gene regulation is an important determinant of the complex specialization of cells in the human brain, and nucleotide changes within regulatory elements contribute to risk for psychiatric disorders. We therefore hypothesize that these debilitating diseases are driven in part by genetic variants that alter gene expression and disturb the balance and function of cell types in brain tissue. Single-cell open chromatin assays are a promising approach to testing this hypothesis by mapping variants to regulatory elements specific to and shared across cell populations. There are two major barriers to this strategy, for which our project proposes modeling solutions. First, despite being the best assay currently, single-cell ATAC-sequencing (scATAC-seq) suffers from low resolution, meaning that an open chromatin region may be supported by zero or few reads in a given cell. This makes it hard to identify coherent cell populations. We propose a network model for semi-supervised clustering of cells in scATAC-seq that leverages information from higher-coverage bulk tissue experiments and single-cell RNA-sequencing (scRNA-seq), if available. The expected outcomes from applying this model to compendia of brain data from public repositories and our collaborators are (i) identification of open chromatin regions that differentiate cell types and states, and (ii) discovery of resolved cell populations whose open chromatin is enriched for psychiatric disorder associated genetic variants. These results alone may not be enough to develop a mechanistic understanding of how variants impact brain function. To address this second challenge, we will implement a computationally efficient, machine-learning framework for predicting the specific regulatory functions of single-cell open chromatin regions from our network model and other approaches. Gene regulatory enhancers are particularly amenable to this approach, because high-throughput mouse transgenics and massively parallel reporter assays have generated enough validated enhancers for supervised learning. Our framework will be easy to apply to other regulatory functions, such as insulating boundaries in chromatin capture data. By developing a compressed, yet flexible, featurization of massive bulk and single-cell data compendia, we will enable rapid iteration with computationally intensive prediction algorithms to be applied to single-cell open chromatin regions. Our approach will also incorporate transfer learning from data-rich (e.g., postmortem or mouse brains) to data-poor settings (e.g., human late-gestation brains). We expect predicted regulatory elements to be more enriched for psychiatric disorder genetic risk, to provide mechanistic insight regarding how variants cause disease, and to be useful molecular tools. Together our two proposed computational approaches will leverage the complementary strengths of bulk and single-cell data to resolve regulatory elements that drive the exquisite diversity of cells in developing and adult brains towards mapping the non-coding contribution of psychiatric disease.

Public Health Relevance

The human brain is a complex tissue comprised of diverse cell types with distinct gene regulation and function, making it difficult to mechanistically link genetic mutations to differences in brain health. This project will develop a network model and machine-learning framework that leverage single-cell and bulk genomics data to identify genome sequences that control gene expression in specific cell populations and states. We hypothesize that these cell type resolved regulatory elements will shed light on how sequence variants outside protein-coding genes increase risk for psychiatric disorders.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Research Project (R01)
Project #
1R01MH123178-01
Application #
10007660
Study Section
Special Emphasis Panel (ZMH1)
Program Officer
Arguello, Alexander
Project Start
2020-04-15
Project End
2024-02-29
Budget Start
2020-04-15
Budget End
2021-02-28
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
J. David Gladstone Institutes
Department
Type
DUNS #
099992430
City
San Francisco
State
CA
Country
United States
Zip Code
94158