Theoretical Foundations and Software Infrastructure for Biological Network Databases

Koyuturk, Mehmet

Abstract

Ever-increasing amounts of physical, functional, and statistical interaction data among bio-molecules, ranging from DNA regulatory regions, functional RNAs, proteins, metabolites, lipids, as well as those among genomic variants, offer unprecedented opportunities for computational discovery and for constructing a unified systems view of the cellular machinery. These data and associated formalisms have enabled systems approaches that led to unique advances in biomedical sciences. Unfortunately, however, storage schemes, data structures, representations, and query mechanisms for network data are considerably more complex, compared to other, at or low-dimensional data representations (e.g., sequences or molecular expression). This complexity is even more evident when we consider heterogeneity of possible interactions that can occur in the cell. For example, a pair of protein-coding genes can interact in a variety of ways: i) we can model physical interactions between their gene products, or their protein-protein interactions, ii) inter- action of a gene product with the promoter/enhancer/silencer region of the other gene, or iii) genetic interaction among double-mutants with significantly different phenotype than the effect of single mutations combined. This complexity is further evident, when one considers different versions of datasets, different techniques used for assaying and gathering the interactions between molecules, linkages across data, and interfaces with other tools. This project seeks to answer a number of fundamental questions that relate to efficient utilization of large network- structured datasets: - what are (provably) optimal storage schemes for large network structured databases? how should multiple versions of same/ related datasets be stored? how does one trade-off compression with query efficiency? and how does one suitably abstract network data so that users can interactively interrogate them using front-ends such as Cytoscape? This project aims to answer these questions by developing theoretically grounded and computationally validated storage schemes, algorithms, and software that will enable efficient and effective storage, update, processing, and querying of biological networks. We will develop compression techniques for efficient storage and version control mechanisms that allow users to create their own versions of networks, algorithms for efficient query processing on these networks, and implementations of these algorithms into broadly accessible and user-friendly software. This research will result in novel computational tools that will be disseminated to the community in the form of open source public domain software. Our tools will render network data fundamentally more accessible to the broader community in biomedical sciences. This will make use of network data more common place in applications including the identification of composite prognostic and diagnostic markers, disease gene prioritization, modeling of tumor het- erogeneity and progression in cancers, informing treatment, identification of therapeutic targets, and drug repositioning. From these points of view, the algorithms and software have far reaching and deep impact.

Public Health Relevance

Biochemical networks provide a unified systems view of the cellular machinery in living organisms, but the complexity of network-structured data poses challenges in storing, analyzing, and querying of large collections of networks. This project aims to develop compression techniques for efficient storage of 'big' network data, version control mechanisms that allow users to create their own versions of networks, and algorithms for efficient query processing on these networks. All these methods will be implemented into accessible software and will be made publicly available.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 5U01CA198941-02
Application #: 9070595
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Li, Jerry

Project Start: 2015-06-01
Project End: 2018-05-31
Budget Start: 2016-06-01
Budget End: 2017-05-31
Support Year: 2
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Case Western Reserve University
Department: Engineering (All Types)
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 077758407

City: Cleveland
State: OH
Country: United States
Zip Code: 44106

Related projects


NIH 2017 U01 CA	Theoretical Foundations and Software Infrastructure for Biological Network Databases Koyuturk, Mehmet / Case Western Reserve University	$436,692
NIH 2016 U01 CA	Theoretical Foundations and Software Infrastructure for Biological Network Databases Koyuturk, Mehmet / Case Western Reserve University
NIH 2015 U01 CA	Theoretical Foundations and Software Infrastructure for Biological Network Databases Koyuturk, Mehmet / Case Western Reserve University

Publications

Qiao, Shi; Koyuturk, Mehmet; Ozsoyoglu, Meral Z (2018) Querying of Disparate Association and Interaction Data in Biomedical Applications. IEEE/ACM Trans Comput Biol Bioinform 15:1052-1065

Maxwell, Sean; Chance, Mark R; Koyutürk, Mehmet (2017) Linearity of network proximity measures: implications for set-based queries and significance testing. Bioinformatics 33:1354-1361

Mohammadi, Shahin; Gleich, David F; Kolda, Tamara G et al. (2017) Triangular Alignment (TAME): A Tensor-Based Approach for Higher-Order Network Alignment. IEEE/ACM Trans Comput Biol Bioinform 14:1446-1458

Savel, Daniel; LaFramboise, Thomas; Grama, Ananth et al. (2017) Pluribus-Exploring the Limits of Error Correction Using a Suffix Tree. IEEE/ACM Trans Comput Biol Bioinform 14:1378-1388

Stanfield, Zachary; Co?kun, Mustafa; Koyutürk, Mehmet (2017) Drug Response Prediction as a Link Prediction Problem. Sci Rep 7:40321

Cowman, Tyler; Koyutürk, Mehmet (2017) Prioritizing tests of epistasis through hierarchical representation of genomic redundancies. Nucleic Acids Res 45:e131

Mukund, Kavitha; Subramaniam, Shankar (2017) Co-expression Network Approach Reveals Functional Similarities among Diseases Affecting Human Skeletal Muscle. Front Physiol 8:980

Ajami, Nassim E; Gupta, Shakti; Maurya, Mano R et al. (2017) Systems biology analysis of longitudinal functional response of endothelial cells to shear stress. Proc Natl Acad Sci U S A 114:10990-10995

Magner, Abram; Kihara, Daisuke; Szpankowski, Wojciech (2017) A Study of the Boltzmann Sequence-Structure Channel. Proc IEEE Inst Electr Electron Eng 105:286-305

Perez-Riverol, Yasset; Bai, Mingze; da Veiga Leprevost, Felipe et al. (2017) Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol 35:406-409

Showing the most recent 10 out of 17 publications

Comments

Be the first to comment on Mehmet Koyuturk's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: