The specific aim of this UniProt Consortium is to provide a centralized protein sequence and function resource by enhancing the UniProt Knowledgebase (UniProtKB) and ensuring that the diverse information in UniProt is of use to a broad scientific user community by exploiting a range of dissemination strategies. The UniProtKB will include a variety of data types including, but not limited to, protein sequences, nomenclature, family classifications, and alternatively-spliced and modified forms. Relevant information on protein function will be included with potential protein interactions, expression patterns, pathways and controlled vocabularies of Gene Ontology (GO terms). Annotation methods applied in the UniProtKB will include extraction of information from the literature and computational analyses, as well as integrating and mining large-scale data sets. The types of evidence and methods of annotation for both experimental and computational data along with attribution of the source will be included. The UniProtKB will rely on high interoperability with other databases, while exploiting novel approaches to encourage community curation. To facilitate the use of UniProt, the UniProt Consortium will enhance its existing user-friendly interfaces and tools to allow for simple and complex queries and for retrieval of large datasets. Database records will be down-loadable in defined, parsable format. An efficient and responsive user support service will be provided. Finally, the UniProt Consortium will exert the flexibility and adaptability needed to respond to changing needs of the scientific community. The broad, long-term objectives of this project are: To provide the scientific community with the Universal Protein Resource (UniProt) as a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. To enable scientists to identify and analyze products of protein-coding genes by making text- and sequence-based queries in the UniProt databases. To provide efficient and unencumbered access to the databases produced by the UniProt Consortium.

Public Health Relevance

The databases produced by the UniProt Consortium will provide researchers with an integrated access to protein sequence and function by gathering and enriching data from genomics and proteomics projects as well as the results published by individual researchers. This is a crucial step in making genomics and proteomics research results easily accessible to support biomedical research in academia and industry and hence facilitate the development of preventive and curative strategies for human health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Biotechnology Resource Cooperative Agreements (U41)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O2))
Program Officer
Bonazzi, Vivien
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
European Molecular Biology Laboratory
Zip Code
Morales, Joannella; Welter, Danielle; Bowler, Emily H et al. (2018) A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol 19:21
Herrero, Javier; Muffato, Matthieu; Beal, Kathryn et al. (2016) Ensembl comparative genomics resources. Database (Oxford) 2016:
Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael et al. (2016) UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 1374:23-54
Pundir, Sangya; Martin, Maria J; O'Donovan, Claire et al. (2016) UniProt Tools. Curr Protoc Bioinformatics 53:1.29.1-15
UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204-12
Suzek, Baris E; Wang, Yuqi; Huang, Hongzhan et al. (2015) UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31:926-32
Pundir, Sangya; Magrane, Michele; Martin, Maria J et al. (2015) Searching and Navigating UniProt Databases. Curr Protoc Bioinformatics 50:1.27.1-10
Bastian, Frederic B; Chibucos, Marcus C; Gaudet, Pascale et al. (2015) The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations. Database (Oxford) 2015:bav043
Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H et al. (2015) HAMAP in 2015: updates to the protein family classification and annotation system. Nucleic Acids Res 43:D1064-70
Huntley, Rachael P; Sawford, Tony; Mutowo-Meullenet, Prudence et al. (2015) The GOA database: gene Ontology annotation updates for 2015. Nucleic Acids Res 43:D1057-63

Showing the most recent 10 out of 35 publications