Rapid advances in biotechnologies are amassing biological interaction data, such as protein-protein and gene-gene interaction networks, at unprecedented pace and rate, presenting a new powerful resource and allowing the reformulation of old, yet important, biological questions in a new context. The size and complexity of these new types of data pose great challenges for experimental and computational biologists alike. Addressing these challenges has been a primary focus of much research under the umbrella term of systems biology. However, almost no work has been done on providing tools for simultaneous evolutionary analysis of genomic and interactomic data. This project will delineate the significant impact such a simultaneous analysis can have on understanding and analyzing biological interaction networks, and will explore new methodologies for conducting computational analyses. In particular, two areas will be addressed that will help shed light on interaction networks and their complexity:
1. Novel genome-interactome evolutionary models. Coalescent theory has been one of the central models for establishing the relationships among gene genealogies and species phylogenies. In its current form this theory neither allows for modeling events that arise in genomic studies, such as gene duplication and loss, nor has it been used to explain interaction network evolution. This research will extend coalescent theory to model genome-scale evolutionary events, and develop a new unified framework for modeling the simultaneous evolution of genomic and interactomic data.
2. Novel stochastic modeling and inference using graph grammars. Stochastic models, such as hidden Markov models and stochastic context-free grammars, have been used extensively in the analysis of biological sequence data. However, no equivalent models have been introduced for analysis of interaction networks. This research will explore new applications of stochastic graph grammars, as well as ways in which these stochastic models can be used to provide insightful analyses of these networks.
Broad Impact
Situated at the intersection of cellular, molecular, and evolutionary biology, this work will have a significant impact on the development and applications of computational tools such as stochastic graph grammars and dissimilarity measures. The project will provide opportunities for training students in an interdisciplinary area, and will result in the development of new courses focused on evolutionary analysis of biological networks. The interdisciplinary nature of the proposed work will help successfully recruit students to computer science from traditionally under-represented groups. The project methodologies will be implemented in software packages and made available through open-source mechanisms.