The era of high-throughput genome sequencing has greatly increased the number of known protein sequences, but our knowledge of protein function has not kept pace; many newly discovered proteins are annotated as """"""""function unknown"""""""". These genes, currently labeled as hypothetical, might support important biological cell functions and could potentially serve as important targets for medical, diagnostic, or pharmacological studies at the same time high-throughput functional genomics methods are generating data that shed light on protein functions, but it is difficult to integrate information from all of these sources in a comprehensive manner. We propose a systematic methodology for integration of multiple sources of gene function evidence based on probabilistic inference methods. We use a functional linkage graph representation of evidence, together with controlled vocabularies of function descriptors such as the GO ontology, and develop methods to integrate and propagate function classifications through evidence networks to generate predicted functions for proteins of unknown function, with probabilities to indicate the confidence of the prediction. We propose to design, develop validate and disseminate software for predicting protein functions using our methods, as well as making available a database of predictions that will be maintained current with the available evidence on an ongoing basis.
Showing the most recent 10 out of 14 publications