In this collaborative R01, "Networks from multidimensional data for schizophrenia and related disorders" submitted in response to RFA-MH-12-020, we propose to develop methods for integrating a broad range of genomic, imaging, and clinical data, hosting all data, methods, and results on a novel, flexible and extensible computing platform. Subsequently, these data and methods will be used to establish workflows available to the research community to integrate and mine the data for discovery. As proof-of-concept, multiple datasets for schizophrenia (SCZ) will be used and then extended to additional mental disorders. Specifically, in AIM 1 we will adapt the Synapse platform at Sage Bionetworks to host, QC, normalize, and transform data in an analysis ready format. Synapse will also enable computation, storage, sharing, and integration of SCZ specific data with pre-existing public data. The Sage platform will be hosted by the data center in the Institute of Genomics and Multiscale Biology at the Mount Sinai School of Medicine consisting of a data warehouse (organized file systems and databases), a web service tier and applications tier adapted to facilitate network reconstruction and more generally model building with SCZ data.
In AIM 2, we will develop a pipeline of analytic methods that include new and existing tools for the primary processing of multiple types of data. Using direct experimental findings we will generate primary analysis datasets (e.g., expression QTLs, imaging QTLs, GWAS with SNP/CNV genotypes, RNASeq signatures, and DNA methylation and RNAseq associations), construct interaction networks with population-based expression and imaging datasets (e.g. gene expression, functional MRI and structural MRI), transform all data and results into analysis ready formats, and construct a standard set of queries to facilitate SCZ gene discovery.
In AIM 3 following platform development, generation of primary analysis datasets, and basic network constructions, we will develop and apply methods to construct integrated, higher-order molecular networks and more generalized models to enhance our understanding of the genetic loci and gene networks underlying schizophrenia. Using a Bayesian framework, methods will be developed that identify network modules and the underlying genetic variance component (including epistatic interactions), incorporate prior disease information and extensive prior biological knowledge to construct more detailed probabilistic causal models, and identify causal regulators of networks associated with SCZ.
In AIM 4, we will assess the extent to which the models validate in independent SCZ data and in bipolar disorder and autism. This proposal should have a major impact on the field as it proposes to create a solution, in the form of new platforms and analytic methods, for the bottleneck in gene discovery that results from our limited ability to fully analyze the data currently available on large samples of individuals suffering fro mental illness. This proposal will make possible the efficient use of this wealth of multi-dimensional data.

Public Health Relevance

In the United States, over a million people have schizophrenia. The costs are staggering in human and financial terms. We propose to develop methods for integrating a broad range of genomic data into a novel, flexible and extensible computing platform. Subsequently, these data will be used to develop a pipeline of algorithms for integrating and mining the data. We will use as a proof-of-concept multiple datasets for schizophrenia, and then extend this to additional mental disorders.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZMH1-ERB-C (02))
Program Officer
Senthil, Geetha
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Schools of Medicine
United States
Zip Code