CORE 1: THEORETICAL AND COMPUTATIONAL STUDIES Project 1. Atomic-level molecular-interaction models: B.Honig, H.Bussemaker, D.Murray. Background: One of the long range goals of MAGNet is the integration of structural information in all aspects of Systems Biology research. Our strategy is to develop a computational infrastructure that will facilitate the integration process and to apply our computational tools across the entire spectrum of MAGNet activities. Our efforts will involve the direct use of three-dimensional structural information, where available, and the extensive application of modeling techniques. The Honig and Murray labs have demonstrated the power of modeling in numerous collaborative studies (see (8, 9)). Our focus in this proposal is twofold. First, our recent discovery of the role of minor groove shape in protein-DNA recognition (7) opens up a broad range of questions to be addressed (see also DBP 1). Second, we will develop databases and models that allow us to predict the structure of proteins involved in different pathways, with focus on the computational prediction of protein-protein and protein-membrane interactions. These tools will impact projects throughout MAGNet, with specific emphasis on modeling cancer-related pathways. Project 2. Regulatory Molecular-Interaction Model: A.Califano, D.Anastassiou, D.Pe'er, D.Vitkup Background: Regulatory networks are becoming increasingly valuable in the elucidation of cellular function and of its dysregulation in disease (5, 6, 22-24). Additionally, integrative genetical-genomics models that use genetics to inform causality in regulatory models have been successfully used to elucidate determinants of mammalian traits, which have been experimentally validated (25). Yet, this area of investigation is just in its infancy and significant non-incremental improvements are necessary before these tools and methodologies may be routinely used by biologists for the elucidation of physiological and pathological mechanisms. Specifically, regulatory network models of higher eukaryotes are largely incomplete, lack context specificity, and, with few exceptions (21), address only one molecular-interaction layer: generally either the transcriptional (22) or the protein-protein interaction layer (26). Indeed, the vast majority of pathway models used in the literature is assembled from the literature or from ex vivo data, such as yeast-2-hybrids, and is thus both biased and not specific to the cellular context of interest. Not surprisingly, there are very few examples where unbiased computational interrogation of these models has led to the elucidation of novel biological mechanism. Rather, these are used as conceptual tools to explore broad association between disease and network connectivity or to explore regulatory interactions surrounding genes of specific interest. Similarly, paracrine and endocrine regulatory processes spanning multiple cell types, such as those driven by stroma-tumor (27), gut-bone (28), and glia-motor neuron (29) interactions, are virtually unmapped at a genome-wide level. Finally, several informative data modalities are poorly integrated in efforts to dissect molecular-interactions. For instance, data on structure-based specificity of protein-DNA and protein-protein interactions has not been systematically integrated with functional data to reverse-engineer regulatory networks. Project 3. Genetic Variability Models: C.Wiggins, D.Pe'er, I.Pe'er, R.Rabadan Background: The data-driven revolution currently transforming population genetics is the focus of the third theme in Core 1. Abundant sequence data challenge our decades-old understanding of population genetics - particularly dynamics - and support new investigations to learn the mapping from microscopic genetic variation to macroscopic phenotypic (and disease) response. This theme spans from the fast evolutionary dynamics of small genomes (viruses) to population data of large human genomes, tied together via machine learning methods, which constrain and guide our understanding of population genetics and population dynamics. In the same way that microarray data spawned an entire new field of quantitative inquiry into transcriptional regulatory networks a decade ago, current advances in both technology (sequencing methods, in particular) and computation (algorithmic advances in regression, in particular) now allow learning the structure and even dynamics of the genotype-phenotype relationship from data. Project 4. Software Development: A.FIoratos, A.Califano, G.Kaiser Background: A key objective of the NCBC program in general and the MAGNet Center in particular is to facilitate the dissemination of advanced computational tools and data resources to the national and international biomedical research communities. Any software platform used for that purpose must address a complex set of challenges. Integration and user-friendliness: the fundamentally integrative nature of modern biomedical research necessitates the combination of data from multiple genomic/biomedical databases and the use of an array of advanced analysis techniques (39). Making tools and resources accessible in an integrated and interoperable manner is a prerequisite for lowering the adoption barrier by biologists that are not computationally trained. Knowledge sharing and collaboration: Moving beyond the traditional model of providing expert-driven support to bioinformatics tool users (through mailing lists, forums, knowledge bases, etc.) new approaches have emerged that seek to create communities of practice through activity awareness (40-42). Integrative tools can be daunting to use, and they stand to gain tremendously from the ability to automatically build """"""""community memory"""""""" and enable knowledge sharing through the addition of transparent event (activity) collection, aggregation and mining facilities Seamless access to computational Infrastructure: Due to their sheer size and dimensionality, analysis of genomic data sets can be computationally very demanding. It is unlikely that every biomedical researcher that would like to utilize such analyses will have access to local/institutional hardware resources capable of supporting their execution. It is then extremely important to facilitate sharing of public infrastructure, such as grid computing (43, 44). Integration into the national biomedical computing environment: It is becoming increasingly evident that to maximize the impact of analytical and data resources in biomedical research it is desirable to expose them programmatically in a semantically aware manner (45, 46). The combination of programmatic accessibility and semantic clarity not only provides a level of self-documentation that increases usability and quality control but also encourages their creative incorporation into shareable workflows and innovative analysis and visualization tools (47-51).

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Specialized Center--Cooperative Agreements (U54)
Project #
5U54CA121852-10
Application #
8707390
Study Section
Special Emphasis Panel (ZRG1-BST-K)
Project Start
Project End
Budget Start
2014-08-01
Budget End
2015-07-31
Support Year
10
Fiscal Year
2014
Total Cost
$1,435,989
Indirect Cost
$225,774
Name
Columbia University (N.Y.)
Department
Type
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032
Hui, Ken Y; Fernandez-Hernandez, Heriberto; Hu, Jianzhong et al. (2018) Functional variants in the LRRK2 gene confer shared effects on risk for Crohn's disease and Parkinson's disease. Sci Transl Med 10:
Azad, Robert N; Zafiropoulos, Dana; Ober, Douglas et al. (2018) Experimental maps of DNA structure at nucleotide resolution distinguish intrinsic from protein-induced DNA deformations. Nucleic Acids Res 46:2636-2647
Abe, Takayuki; Lee, Albert; Sitharam, Ramaswami et al. (2017) Germ-Cell-Specific Inflammasome Component NLRP14 Negatively Regulates Cytosolic Nucleic Acid Sensing to Promote Fertilization. Immunity 46:621-634
Alvarez, Mariano J; Shen, Yao; Giorgi, Federico M et al. (2016) Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet 48:838-47
Sheng, Ren; Jung, Da-Jung; Silkov, Antonina et al. (2016) Lipids Regulate Lck Protein Activity through Their Interactions with the Lck Src Homology 2 Domain. J Biol Chem 291:17639-50
Lachmann, Alexander; Giorgi, Federico M; Lopez, Gonzalo et al. (2016) ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information. Bioinformatics 32:2233-5
Park, Mi-Jeong; Sheng, Ren; Silkov, Antonina et al. (2016) SH2 Domains Serve as Lipid-Binding Modules for pTyr-Signaling Proteins. Mol Cell 62:7-20
Baskovich, Brett; Hiraki, Susan; Upadhyay, Kinnari et al. (2016) Expanded genetic screening panel for the Ashkenazi Jewish population. Genet Med 18:522-8
Bisikirska, Brygida; Bansal, Mukesh; Shen, Yao et al. (2016) Elucidation and Pharmacological Targeting of Novel Molecular Drivers of Follicular Lymphoma Progression. Cancer Res 76:664-74
Del Giudice, Ilaria; Marinelli, Marilisa; Wang, Jiguang et al. (2016) Inter- and intra-patient clonal and subclonal heterogeneity of chronic lymphocytic leukaemia: evidences from circulating and lymph nodal compartments. Br J Haematol 172:371-383

Showing the most recent 10 out of 258 publications