In the past decade, substantial progress has been made in discovery of genetic variants and genes associated with risk for psychiatric disorders. Altered gene expression in the brain, particularly at the cell-type-specific level, is believed to be a driving factor in conferring risk through these genetic variants. To link altered transcription to psychopathology, an immense amount of transcriptomic data is being accumulated, including single-cell and tissue level transcriptomes. Some of these samples cover critical developmental periods. An outstanding challenge is how to integrate single cell and tissue level transcriptomic data and how genetic variation alters transcription in specific cells to produce psychopathology. In this high dimensional ?omics setting, we need powerful statistical and machine learning tools to produce integrative analyses and mesh those results with large psychiatric genetic datasets to achieve new insights. We propose to use our expertise in high dimensional statistical inference to tackle this challenge. We go beyond machine learning models that specialize in prediction, focusing instead on providing interpretable statistical inferences. We identify gene communities, defined in terms of cell type and spatiotemporal window, driving risk. With vast amounts of data comes great risk of spurious inferences based on non-rigorous analyses. On the other hand, reliable, but nave tools can sacrifice power by not fully integrating all available information. Our overall objective to produce analytic tools that yield reliable and powerful inferences relating cell-type-specific gene expression with genetic risk factors. With these analytical tools made available to the research community, our longer-term goal is to hasten discoveries in the field and thus build the foundation from which therapeutic targets for psychiatric disorders emerge. Our objectives will be accomplished with the following Specific aims: 1) statistically rigorous methods to select cell-type markers and to estimate cell-type-specific (CTS) expression, which will facilitate downstream analyses, including CTS eQTLs from tissue; 2) modeling dynamic gene communities throughout development of cell lineages or tissue and relating them to community-based-score statistics to gain insight into the impact of genetic risk factors on psychiatric disorders; and 3) novel methods for estimating gene co-expression networks from single cell RNA-seq. This contribution is significant because it will make many transcriptomic resources more valuable and enable downstream analyses, such as detection of CTS eQTLs in larger sample sets with higher power. Dynamic network analysis tools enhance our ability to identify gene communities that vary over developmental epochs and this variation facilitates inferences that relate cell type and developmental period with risk factors. The research proposed is innovative, in our opinion, because it uses novel statistical methods for integrative analysis of data from multiple sources, and cutting edge results to represent high dimensional data in a meaningful way that lends itself to clustering and network analysis.

Public Health Relevance

The statistical tools and results arising from the proposed research are relevant to public health because they will generate a deeper understanding of the etiology of psychiatric disorders. Moreover, the refined methods and new results provided by our research will be directly useful for the research community as they hunt for ways to prevent or treat mental disorders.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZMH1)
Program Officer
Arguello, Alexander
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code