The era of high-throughput genome sequencing has greatly increased the number of known protein sequences, but our knowledge of protein function has not kept pace; many newly discovered proteins are annotated as """"""""function unknown"""""""". These genes, currently labeled as hypothetical, might support important biological cell functions and could potentially serve as important targets for medical, diagnostic, or pharmacological studies at the same time high-throughput functional genomics methods are generating data that shed light on protein functions, but it is difficult to integrate information from all of these sources in a comprehensive manner. We propose a systematic methodology for integration of multiple sources of gene function evidence based on probabilistic inference methods. We use a functional linkage graph representation of evidence, together with controlled vocabularies of function descriptors such as the GO ontology, and develop methods to integrate and propagate function classifications through evidence networks to generate predicted functions for proteins of unknown function, with probabilities to indicate the confidence of the prediction. We propose to design, develop validate and disseminate software for predicting protein functions using our methods, as well as making available a database of predictions that will be maintained current with the available evidence on an ongoing basis.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG003367-01A1
Application #
6917419
Study Section
Special Emphasis Panel (ZRG1-BDMA (01))
Program Officer
Bonazzi, Vivien
Project Start
2005-09-12
Project End
2008-08-31
Budget Start
2005-09-12
Budget End
2006-08-31
Support Year
1
Fiscal Year
2005
Total Cost
$350,000
Indirect Cost
Name
Boston University
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
049435266
City
Boston
State
MA
Country
United States
Zip Code
02215
Jiang, Xiaoyu; Gold, David; Kolaczyk, Eric D (2011) Network-based auto-probit modeling for protein function prediction. Biometrics 67:958-66
Mori, Marcelo A; Liu, Manway; Bezy, Olivier et al. (2010) A systems biology approach identifies inflammatory abnormalities between mouse strains prior to development of metabolic disease. Diabetes 59:2960-71
Wu, Chang-Jiun; Cai, Tianxi; Rikova, Klarisa et al. (2009) A predictive phosphorylation signature of lung cancer. PLoS One 4:e7994
Dotan-Cohen, Dikla; Letovsky, Stan; Melkman, Avraham A et al. (2009) Biological process linkage networks. PLoS One 4:e5313
Dotan-Cohen, Dikla; Kasif, Simon; Melkman, Avraham A (2009) Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering. Bioinformatics 25:1789-95
Jiang, Xiaoyu; Nariai, Naoki; Steffen, Martin et al. (2008) Integration of relational and hierarchical network information for protein function prediction. BMC Bioinformatics 9:350
Ghildiyal, Megha; Seitz, Herve; Horwich, Michael D et al. (2008) Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells. Science 320:1077-81
Ku, Manching; Koche, Richard P; Rheinbay, Esther et al. (2008) Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet 4:e1000242
Anton, Brian P; Saleh, Lana; Benner, Jack S et al. (2008) RimO, a MiaB-like enzyme, methylthiolates the universally conserved Asp88 residue of ribosomal protein S12 in Escherichia coli. Proc Natl Acad Sci U S A 105:1826-31
Dotan-Cohen, Dikla; Melkman, Avraham A; Kasif, Simon (2007) Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics 23:3335-42

Showing the most recent 10 out of 14 publications