CORE 1: THEORETICAL AND COMPUTATIONAL STUDIES Project 1. Atomic-level molecular-interaction models: B.Honig, H.Bussemaker, D.Murray. Background: One of the long range goals of MAGNet is the integration of structural information in all aspects of Systems Biology research. Our strategy is to develop a computational infrastructure that will facilitate the integration process and to apply our computational tools across the entire spectrum of MAGNet activities. Our efforts will involve the direct use of three-dimensional structural information, where available, and the extensive application of modeling techniques. The Honig and Murray labs have demonstrated the power of modeling in numerous collaborative studies (see (8, 9)). Our focus in this proposal is twofold. First, our recent discovery of the role of minor groove shape in protein-DNA recognition (7) opens up a broad range of questions to be addressed (see also DBP 1). Second, we will develop databases and models that allow us to predict the structure of proteins involved in different pathways, with focus on the computational prediction of protein-protein and protein-membrane interactions. These tools will impact projects throughout MAGNet, with specific emphasis on modeling cancer-related pathways. Project 2. Regulatory Molecular-Interaction Model: A.Califano, D.Anastassiou, D.Pe'er, D.Vitkup Background: Regulatory networks are becoming increasingly valuable in the elucidation of cellular function and of its dysregulation in disease (5, 6, 22-24). Additionally, integrative genetical-genomics models that use genetics to inform causality in regulatory models have been successfully used to elucidate determinants of mammalian traits, which have been experimentally validated (25). Yet, this area of investigation is just in its infancy and significant non-incremental improvements are necessary before these tools and methodologies may be routinely used by biologists for the elucidation of physiological and pathological mechanisms. Specifically, regulatory network models of higher eukaryotes are largely incomplete, lack context specificity, and, with few exceptions (21), address only one molecular-interaction layer: generally either the transcriptional (22) or the protein-protein interaction layer (26). Indeed, the vast majority of pathway models used in the literature is assembled from the literature or from ex vivo data, such as yeast-2-hybrids, and is thus both biased and not specific to the cellular context of interest. Not surprisingly, there are very few examples where unbiased computational interrogation of these models has led to the elucidation of novel biological mechanism. Rather, these are used as conceptual tools to explore broad association between disease and network connectivity or to explore regulatory interactions surrounding genes of specific interest. Similarly, paracrine and endocrine regulatory processes spanning multiple cell types, such as those driven by stroma-tumor (27), gut-bone (28), and glia-motor neuron (29) interactions, are virtually unmapped at a genome-wide level. Finally, several informative data modalities are poorly integrated in efforts to dissect molecular-interactions. For instance, data on structure-based specificity of protein-DNA and protein-protein interactions has not been systematically integrated with functional data to reverse-engineer regulatory networks. Project 3. Genetic Variability Models: C.Wiggins, D.Pe'er, I.Pe'er, R.Rabadan Background: The data-driven revolution currently transforming population genetics is the focus of the third theme in Core 1. Abundant sequence data challenge our decades-old understanding of population genetics - particularly dynamics - and support new investigations to learn the mapping from microscopic genetic variation to macroscopic phenotypic (and disease) response. This theme spans from the fast evolutionary dynamics of small genomes (viruses) to population data of large human genomes, tied together via machine learning methods, which constrain and guide our understanding of population genetics and population dynamics. In the same way that microarray data spawned an entire new field of quantitative inquiry into transcriptional regulatory networks a decade ago, current advances in both technology (sequencing methods, in particular) and computation (algorithmic advances in regression, in particular) now allow learning the structure and even dynamics of the genotype-phenotype relationship from data. Project 4. Software Development: A.FIoratos, A.Califano, G.Kaiser Background: A key objective of the NCBC program in general and the MAGNet Center in particular is to facilitate the dissemination of advanced computational tools and data resources to the national and international biomedical research communities. Any software platform used for that purpose must address a complex set of challenges. Integration and user-friendliness: the fundamentally integrative nature of modern biomedical research necessitates the combination of data from multiple genomic/biomedical databases and the use of an array of advanced analysis techniques (39). Making tools and resources accessible in an integrated and interoperable manner is a prerequisite for lowering the adoption barrier by biologists that are not computationally trained. Knowledge sharing and collaboration: Moving beyond the traditional model of providing expert-driven support to bioinformatics tool users (through mailing lists, forums, knowledge bases, etc.) new approaches have emerged that seek to create communities of practice through activity awareness (40-42). Integrative tools can be daunting to use, and they stand to gain tremendously from the ability to automatically build "community memory" and enable knowledge sharing through the addition of transparent event (activity) collection, aggregation and mining facilities Seamless access to computational Infrastructure: Due to their sheer size and dimensionality, analysis of genomic data sets can be computationally very demanding. It is unlikely that every biomedical researcher that would like to utilize such analyses will have access to local/institutional hardware resources capable of supporting their execution. It is then extremely important to facilitate sharing of public infrastructure, such as grid computing (43, 44). Integration into the national biomedical computing environment: It is becoming increasingly evident that to maximize the impact of analytical and data resources in biomedical research it is desirable to expose them programmatically in a semantically aware manner (45, 46). The combination of programmatic accessibility and semantic clarity not only provides a level of self-documentation that increases usability and quality control but also encourages their creative incorporation into shareable workflows and innovative analysis and visualization tools (47-51).

National Institute of Health (NIH)
National Cancer Institute (NCI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-K)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
New York
United States
Zip Code
Kushwaha, Ritu; Jagadish, Nirmala; Kustagi, Manjunath et al. (2015) Interrogation of a context-specific transcription factor network identifies novel regulators of pluripotency. Stem Cells 33:367-77
Sanchez-Garcia, FĂ©lix; Villagrasa, Patricia; Matsui, Junji et al. (2014) Integration of genomic data enables selective discovery of breast cancer drivers. Cell 159:1461-75
Messina, Monica; Del Giudice, Ilaria; Khiabanian, Hossein et al. (2014) Genetic lesions associated with chronic lymphocytic leukemia chemo-refractoriness. Blood 123:2378-88
Lee, Eunjee; de Ridder, Jeroen; Kool, Jaap et al. (2014) Identifying regulatory mechanisms underlying tumorigenesis using locus expression signature analysis. Proc Natl Acad Sci U S A 111:5747-52
Sonabend, Adam M; Carminucci, Arthur S; Amendolara, Benjamin et al. (2014) Convection-enhanced delivery of etoposide is effective against murine proneural glioblastoma. Neuro Oncol 16:1210-9
van Arensbergen, Joris; van Steensel, Bas; Bussemaker, Harmen J (2014) In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol 24:695-702
Repunte-Canonigo, Vez; Lefebvre, Celine; George, Olivier et al. (2014) Gene expression changes consistent with neuroAIDS and impaired working memory in HIV-1 transgenic rats. Mol Neurodegener 9:26
Johnson, Stephanie; van de Meent, Jan-Willem; Phillips, Rob et al. (2014) Multiple LacI-mediated loops revealed by Bayesian statistics and tethered particle motion. Nucleic Acids Res 42:10265-77
Pefanis, Evangelos; Wang, Jiguang; Rothschild, Gerson et al. (2014) Noncoding RNA transcription targets AID to divergently transcribed loci in B cells. Nature 514:389-93
Ward, Lucas D; Wang, Junbai; Bussemaker, Harmen J (2014) Characterizing a collective and dynamic component of chromatin immunoprecipitation enrichment profiles in yeast. BMC Genomics 15:494

Showing the most recent 10 out of 178 publications