Currently a gap exists between the explosion of high-throughput data generation in molecular biology and the relatively slower growth of reliable functional information extracted from the data. This gap is largely due to the lack of specificity necessary for accurate gene function prediction in the currently available large-scale experimental technologies for rapid protein function assessment. Bioinformatics methods that integrate diverse data sources in their analysis achieve higher accuracy and thus alleviate this lack of specificity, but there's a paucity of generalizable, efficient, and accurate methods for data integration. In addition, no efficient methods exist to effectively display diverse genomic data, even though visualization has been very valuable for analysis of data from large scale technologies such as gene expression microarrays. The long-term goal of this proposal is to develop an accurate and generalizable bioinformatics framework for integrated analysis and visualization of heterogeneous biological data. We propose to address the data integration problem with a Bayesian network approach and effective visualization methods. We have shown the efficacy of this method in a proof-of-principle system that increased the accuracy of gene function prediction for Saccharomyces cerevisiae compared to individual data sources. Building on our previous work, we present a two-part plan to improve and expand our system and to develop novel visualization methods for genomic data based on the scalable display technology. First, we will investigate the computational and theoretical issues behind accurate integration, analysis and effective visualization of heterogeneous high-throughput data. Then, leveraging our existing system and algorithmic improvements developed in the first part of the project, we will design and implement a full-scale data integration and function prediction system for Saccharomyces cerevisiae that will be incorporated with the Saccharomyces Genome Database (SGD), a model organism database for yeast. The proposed system would provide highly accurate automatic function prediction that can accelerate genomic functional annotation through targeted experimental testing. Furthermore, our system will perform general integration and will offer researchers a unified view of the diverse high-throughput data through effective integration and visualization tools, thereby facilitating hypothesis generation and data analysis. Our scalable visualization methods will enable teams of researchers to examine biological data interactively and thus support the highly collaborative nature of genomic research. In addition to contributing to S. cerevisiae genomics, the technology for efficient and accurate heterogeneous data integration and visualization developed as a result of this proposal will form a basis for systems that address the same set of issues for other organisms, including the human.
Showing the most recent 10 out of 67 publications