This project focuses on econstructing transcriptional regulatory networks by integrating data from perturbation screens and steady state and time course gene expression profiles. This is an important and challenging problem in functional genomics. Its importance stems from the fact that regulatory networks play a key role in our understanding of the inner workings of the cell and their response to external stimuli and environmental changes. The challenges are mainly due to limitations in the available data. Specifically, data obtained from knock-out/down experiments (perturbation screens) are usually limited in sample size and thus potentially noisy and in addition provide indirect evidence regarding gene interactions. Observational data of the organism in steady state or time course ones are more readily available, but their informational content is usually inadequate for the task at hand. The proposed methodology represents a novel computational approach to integrate these two data sources for solving the reconstruction problem. Specifically, the perturbation data are used to obtain causal orderings of the genes;such orderings determine to a large extent which genes are affecting other genes. Since regulatory networks are characterized by feedback mechanisms and due to the potential noisy nature of the perturbation data, multiple causal orderings are consistent with the perturbation data. A fast search algorithm is introduced to obtain them. Subsequently, the network links are estimated through a computationally efficient penalized likelihood method for each ordering and only those appearing in the reconstructions with very high likelihood scores are included in a consensus graph. The proposed approach is technically rigorous, computationally scalable to large networks and based on preliminary evidence exhibits superior performance to existing methods. Further, extensions to integrate time course expression data are considered by employing the framework of network Granger causality. Validation of the proposed methodology will be pursued both with in silico experiments and with real data obtained both from our collaborators (see attached letters of support) and publicly available sources. Note that the real data cover different organisms and different data sources. Finally, the computationally methodology will be implemented in an open source software tool that allows the research community to add methods that enhance network reconstructions. The software will be developed in the programming language R and would also contain executable code for the most computationally intensive components. It would also be implemented as a Taverna workflow, to aid dissemination to the biomedical research community and allow scientists to share input data, workflow results, as well as compare network reconstructions.
Reconstructing transcriptional regulatory networks is a challenging, but important task in functional genomics. Regulatory networks provide information about the inner workings of genes in the cell and in addition a means to identify malfunctioning subnetworks in various disease states. Their reconstruction represents a challenging computational problem, due to the fact that all the available evidence from different data sources is indirect and one has to solve the problem by appropriately putting the limited information pieces together. Data from single gene knock-out experiments provide information about the response of the organism to such limited perturbations, while expression levels of genes with the organism observed in steady state or over time help to glean information about its internal workings. The aim of this research program is to develop computational methods that rigorously integrate the available data sources to produce superior network reconstructions. The methodology would be made available in an easy to use open source software tool that would facilitate biomedical scientists in silico experimentation and also to take advantage of publicly available data and previous network reconstructions.
|Zhao, Sen; Shojaie, Ali (2016) A significance test for graph-constrained estimation. Biometrics 72:484-93|
|Yang, Zi; Michailidis, George (2016) A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32:1-8|
|Tan, Kean Ming; Witten, Daniela; Shojaie, Ali (2015) The cluster graphical lasso for improved estimation of Gaussian graphical models. Comput Stat Data Anal 85:23-36|
|Guo, Jian; Levina, Elizaveta; Michailidis, George et al. (2015) Graphical Models for Ordinal Data. J Comput Graph Stat 24:183-204|
|Shojaie, Ali; Jauhiainen, Alexandra; Kallitsis, Michael et al. (2014) Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles. PLoS One 9:e82393|
|Henderson, James; Michailidis, George (2014) Network reconstruction using nonparametric additive ODE models. PLoS One 9:e94003|
|Voorman, Arend; Shojaie, Ali; Witten, Daniela (2014) Graph Estimation with Joint Additive Models. Biometrika 101:85-101|
|Sedaghat, Nafiseh; Saegusa, Takumi; Randolph, Timothy et al. (2014) Comparative study of computational methods for reconstructing genetic networks of cancer-related pathways. Cancer Inform 13:55-66|