The growing size and diversity of biological databases has necessitated the design of new scalable tools that can search across multiple databases and integrate information from multiple data sources. We propose to develop software for integrating and understanding protein-protein interactions, a fundamental problem in biology. A set of tools will be developed for constructing large-scale probabilistic networks of protein interactions using data sources such as microarrays, bioimages, GO annotations, genomic data, literature, and experimental data. The techniques will be based on Bayesian networks (BN) and Support Vector Machines (SVM), and will be made scalable to large datasets. The second goal is to develop tools for analyzing interaction networks for pathway discovery, motif finding, and function identification. These tools will be based on current research in the areas of graph algorithms, bioinformatics, machine learning, and databases. We will target two model organisms: S. cerevisiae (yeast) and C. elegans (worm). The quality of the constructed networks will be evaluated with known protein interactions for these species. Scalability tests will be performed with the worm interactome that is about ten times larger than the yeast interactome. The developed tools will be compatible with current standards and integrated into a database backend. The resulting software will enable assimilation of heterogeneous biological data with the ultimate goal of increased understanding of fundamental processes in molecular biology. The goal of this Phase I project is to prove the feasibility of constructing and analyzing probabilistic protein interaction networks in a scalable manner using new algorithms. The integration of diverse data sources such as microarrays, genomics, literature, and high-throughput experiments into pathways will facilitate the study the biological processes behind human diseases. The understanding of protein interactions within a pathway and interactions between pathways will lead to the selection of appropriate targets for therapeutic intervention, and eventually to cheaper and faster drug discovery. ? ? ? ?