Within many environments, there are hundreds or thousands of microbial species that interact in a community. These communities can be determined from metagenomic sequencing projects in which DNA is collected from an environment (a sample of seawater, a patch of soil, etc.). The communities can be represented using ecological networks where nodes correspond to species and edges represent inferred or known relationships between the microbes (competition, symbiosis, etc.). Microbial communities in different environments or over time can then be compared by comparing their networks. One computational approach for this comparison is to align the networks by finding regions of local similarity, a new application of network alignment. However, network alignment is a computationally challenging problem for which there are no algorithms that have sufficient speed, accuracy, and generality.
This project will develop improved algorithms for network alignment to compare microbial ecological networks and for other applications in computational biology. Network alignment has found wide adoption in computational biology to compare biological pathways, to correct errors in networks, and to form hypotheses about the roles of genes with unknown function.
Intellectual Merit Improved alignments between networks for different environments or time points will help identify conserved patterns of interactions, truly functional relationships between microbes, and bacteria that are performing similar functions within different microbial communities. The algorithms developed by this project will also lead to more accurate prediction of protein function and protein interactions.
The approach taken by the project will use two primary algorithmic innovations, which will lead to substantially more useful local network aligners. The first is the development of a new signature for describing the similarities between network nodes based on their connections and attributes and the connections and attributes of other nearby nodes. This signature is based on the eigenvalues of subgraphs representing regions of the network around each node. Preliminary work has shown that such a multiscale spectral signature results in global network alignments that are more accurate, and more efficiently computed, than those using other approaches. The second main algorithmic innovation for this project is the explicit modeling of network alignment as an optimization problem with multiple objectives. This will allow the new aligners to handle the often-competing requirements on alignments so that they can find regions of networks that have, for example, genes with similar sequences and similar interaction partners. These innovations will lead to more accurate, high-quality alignments yielding new biological insight.
Broader Impact This project has three aspects to its broader impact. An educational tablet application will be developed that will teach graph theory to high-school students. This interactive application will introduce a beautiful, approachable, yet sophisticated, branch of mathematics to a group of students who often would not have the chance to study it. In addition, the project personnel will participate in programs at Carnegie Mellon University that aim to introduce middle-school girls to technical topics by presenting the new techniques. Finally, while the focus is on biological applications, the techniques and software developed as part of this project are expected to be useful in graphics and vision applications, such as object recognition in photos.