The era of high-throughput genome sequencing has greatly increased the number of known protein sequences, but our knowledge of protein function has not kept pace; many newly discovered proteins are annotated as """"""""function unknown"""""""". These genes, currently labeled as hypothetical, might support important biological cell functions and could potentially serve as important targets for medical, diagnostic, or pharmacological studies at the same time high-throughput functional genomics methods are generating data that shed light on protein functions, but it is difficult to integrate information from all of these sources in a comprehensive manner. We propose a systematic methodology for integration of multiple sources of gene function evidence based on probabilistic inference methods. We use a functional linkage graph representation of evidence, together with controlled vocabularies of function descriptors such as the GO ontology, and develop methods to integrate and propagate function classifications through evidence networks to generate predicted functions for proteins of unknown function, with probabilities to indicate the confidence of the prediction. We propose to design, develop validate and disseminate software for predicting protein functions using our methods, as well as making available a database of predictions that will be maintained current with the available evidence on an ongoing basis.
Jiang, Xiaoyu; Gold, David; Kolaczyk, Eric D (2011) Network-based auto-probit modeling for protein function prediction. Biometrics 67:958-66 |
Mori, Marcelo A; Liu, Manway; Bezy, Olivier et al. (2010) A systems biology approach identifies inflammatory abnormalities between mouse strains prior to development of metabolic disease. Diabetes 59:2960-71 |
Wu, Chang-Jiun; Cai, Tianxi; Rikova, Klarisa et al. (2009) A predictive phosphorylation signature of lung cancer. PLoS One 4:e7994 |
Dotan-Cohen, Dikla; Letovsky, Stan; Melkman, Avraham A et al. (2009) Biological process linkage networks. PLoS One 4:e5313 |
Dotan-Cohen, Dikla; Kasif, Simon; Melkman, Avraham A (2009) Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering. Bioinformatics 25:1789-95 |
Jiang, Xiaoyu; Nariai, Naoki; Steffen, Martin et al. (2008) Integration of relational and hierarchical network information for protein function prediction. BMC Bioinformatics 9:350 |
Ghildiyal, Megha; Seitz, Herve; Horwich, Michael D et al. (2008) Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells. Science 320:1077-81 |
Ku, Manching; Koche, Richard P; Rheinbay, Esther et al. (2008) Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet 4:e1000242 |
Anton, Brian P; Saleh, Lana; Benner, Jack S et al. (2008) RimO, a MiaB-like enzyme, methylthiolates the universally conserved Asp88 residue of ribosomal protein S12 in Escherichia coli. Proc Natl Acad Sci U S A 105:1826-31 |
Dotan-Cohen, Dikla; Melkman, Avraham A; Kasif, Simon (2007) Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics 23:3335-42 |
Showing the most recent 10 out of 14 publications