The era of high-throughput genome sequencing has greatly increased the number of known protein sequences, but our knowledge of protein function has not kept pace; many newly discovered proteins are annotated as """"""""function unknown"""""""". These genes, currently labeled as hypothetical, might support important biological cell functions and could potentially serve as important targets for medical, diagnostic, or pharmacological studies at the same time high-throughput functional genomics methods are generating data that shed light on protein functions, but it is difficult to integrate information from all of these sources in a comprehensive manner. We propose a systematic methodology for integration of multiple sources of gene function evidence based on probabilistic inference methods. We use a functional linkage graph representation of evidence, together with controlled vocabularies of function descriptors such as the GO ontology, and develop methods to integrate and propagate function classifications through evidence networks to generate predicted functions for proteins of unknown function, with probabilities to indicate the confidence of the prediction. We propose to design, develop validate and disseminate software for predicting protein functions using our methods, as well as making available a database of predictions that will be maintained current with the available evidence on an ongoing basis.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG003367-03
Application #
7282580
Study Section
Special Emphasis Panel (ZRG1-BDMA (01))
Program Officer
Bonazzi, Vivien
Project Start
2005-09-12
Project End
2010-08-31
Budget Start
2007-09-01
Budget End
2010-08-31
Support Year
3
Fiscal Year
2007
Total Cost
$331,863
Indirect Cost
Name
Boston University
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
049435266
City
Boston
State
MA
Country
United States
Zip Code
02215
Jiang, Xiaoyu; Gold, David; Kolaczyk, Eric D (2011) Network-based auto-probit modeling for protein function prediction. Biometrics 67:958-66
Mori, Marcelo A; Liu, Manway; Bezy, Olivier et al. (2010) A systems biology approach identifies inflammatory abnormalities between mouse strains prior to development of metabolic disease. Diabetes 59:2960-71
Wu, Chang-Jiun; Cai, Tianxi; Rikova, Klarisa et al. (2009) A predictive phosphorylation signature of lung cancer. PLoS One 4:e7994
Dotan-Cohen, Dikla; Letovsky, Stan; Melkman, Avraham A et al. (2009) Biological process linkage networks. PLoS One 4:e5313
Dotan-Cohen, Dikla; Kasif, Simon; Melkman, Avraham A (2009) Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering. Bioinformatics 25:1789-95
Anton, Brian P; Saleh, Lana; Benner, Jack S et al. (2008) RimO, a MiaB-like enzyme, methylthiolates the universally conserved Asp88 residue of ribosomal protein S12 in Escherichia coli. Proc Natl Acad Sci U S A 105:1826-31
Jiang, Xiaoyu; Nariai, Naoki; Steffen, Martin et al. (2008) Integration of relational and hierarchical network information for protein function prediction. BMC Bioinformatics 9:350
Ghildiyal, Megha; Seitz, Herve; Horwich, Michael D et al. (2008) Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells. Science 320:1077-81
Ku, Manching; Koche, Richard P; Rheinbay, Esther et al. (2008) Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet 4:e1000242
Nariai, Naoki; Kolaczyk, Eric D; Kasif, Simon (2007) Probabilistic protein function prediction from heterogeneous genome-wide data. PLoS One 2:e337

Showing the most recent 10 out of 14 publications