Protein sequence homology (i.e., descent from a common ancestral sequence) is perhaps the most widely used tool for annotating the putative functions of genes. Homologous proteins often share functions inherited from the common ancestor, so if the function of one protein has been experimentally determined, the function of its homologues can often, but not always, be inferred to be the same. Homology-based inference allows functional data from experimentally tractable model organisms (such as E. coli, yeast, Drosophila, C. elegans and the mouse) to be applied to other organisms, most notably humans. The past several years have seen a dramatic increase in the amount of structured, computationally accessible data available on the functions of proteins and the genes that encode them, primarily using the Gene Ontology (GO). The most useful of these data have been manually entered (}curated}) by a biologist after reading papers in the scientific literature. The goal of this proposal is to leverage these literature-derived ontology annotations by using them, in a carefully curated and structured manner, as the basis for inferred annotations in other organisms. We will utilize and extend existing software developed in our groups to develop a web-accessible environment for curation of GO terms in the context of evolutionary relationships, and link the data to biological pathway data and data standards. We will integrate the software into current GO term annotation projects, and support a broad data ex- change and dissemination plan across GO and pathway ontology curation efforts and the communities of bio- medical researchers they serve.

Public Health Relevance

The current research project provides a cost-effective, accurate methodology for taking experimentally based information about genes in a wide range of well-studied species, and applying this information to understanding human biology, genetics and disease. The results from this methodology will be made broadly available to both researchers and the public, in formats accessible to both people and computers.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM081084-02
Application #
7591614
Study Section
Special Emphasis Panel (ZRG1-BST-Q (01))
Program Officer
Lyster, Peter
Project Start
2008-04-01
Project End
2010-03-31
Budget Start
2009-04-01
Budget End
2010-03-31
Support Year
2
Fiscal Year
2009
Total Cost
$679,490
Indirect Cost
Name
Sri International
Department
Type
DUNS #
009232752
City
Menlo Park
State
CA
Country
United States
Zip Code
94025
Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T et al. (2013) Large-scale gene function analysis with the PANTHER classification system. Nat Protoc 8:1551-66
Mi, Huaiyu; Muruganujan, Anushya; Thomas, Paul D (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41:D377-86
Thomas, Paul D; Wood, Valerie; Mungall, Christopher J et al. (2012) On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report. PLoS Comput Biol 8:e1002386
Hunter, Sarah; Jones, Philip; Mitchell, Alex et al. (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306-12
Mi, Huaiyu; Muruganujan, Anushya; Demir, Emek et al. (2011) BioPAX support in CellDesigner. Bioinformatics 27:3437-8
Gaudet, Pascale; Livstone, Michael S; Lewis, Suzanna E et al. (2011) Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief Bioinform 12:449-62
Mi, Huaiyu; Dong, Qing; Muruganujan, Anushya et al. (2010) PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res 38:D204-10
Thomas, Paul D (2010) GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinformatics 11:312
Hunter, Sarah; Apweiler, Rolf; Attwood, Teresa K et al. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res 37:D211-5