Protein sequence homology (i.e., descent from a common ancestral sequence) is perhaps the most widely used tool for annotating the putative functions of genes. Homologous proteins often share functions inherited from the common ancestor, so if the function of one protein has been experimentally determined, the function of its homologues can often, but not always, be inferred to be the same. Homology-based inference allows functional data from experimentally tractable model organisms (such as E. coli, yeast, Drosophila, C. elegans and the mouse) to be applied to other organisms, most notably humans. The past several years have seen a dramatic increase in the amount of structured, computationally accessible data available on the functions of proteins and the genes that encode them, primarily using the Gene Ontology (GO). The most useful of these data have been manually entered (}curated}) by a biologist after reading papers in the scientific literature. The goal of this proposal is to leverage these literature-derived ontology annotations by using them, in a carefully curated and structured manner, as the basis for inferred annotations in other organisms. We will utilize and extend existing software developed in our groups to develop a web-accessible environment for curation of GO terms in the context of evolutionary relationships, and link the data to biological pathway data and data standards. We will integrate the software into current GO term annotation projects, and support a broad data ex- change and dissemination plan across GO and pathway ontology curation efforts and the communities of bio- medical researchers they serve.
The current research project provides a cost-effective, accurate methodology for taking experimentally based information about genes in a wide range of well-studied species, and applying this information to understanding human biology, genetics and disease. The results from this methodology will be made broadly available to both researchers and the public, in formats accessible to both people and computers.
Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T et al. (2013) Large-scale gene function analysis with the PANTHER classification system. Nat Protoc 8:1551-66 |
Mi, Huaiyu; Muruganujan, Anushya; Thomas, Paul D (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41:D377-86 |
Hunter, Sarah; Jones, Philip; Mitchell, Alex et al. (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306-12 |
Thomas, Paul D; Wood, Valerie; Mungall, Christopher J et al. (2012) On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report. PLoS Comput Biol 8:e1002386 |
Mi, Huaiyu; Muruganujan, Anushya; Demir, Emek et al. (2011) BioPAX support in CellDesigner. Bioinformatics 27:3437-8 |
Gaudet, Pascale; Livstone, Michael S; Lewis, Suzanna E et al. (2011) Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief Bioinform 12:449-62 |
Mi, Huaiyu; Dong, Qing; Muruganujan, Anushya et al. (2010) PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res 38:D204-10 |
Thomas, Paul D (2010) GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinformatics 11:312 |
Hunter, Sarah; Apweiler, Rolf; Attwood, Teresa K et al. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res 37:D211-5 |