The specific aim of this UniProt Consortium is to provide a centralized protein sequence and function resource by enhancing the UniProt Knowledgebase (UniProtKB) and ensuring that the diverse information in UniProt is of use to a broad scientific user community by exploiting a range of dissemination strategies. The UniProtKB will include a variety of data types including, but not limited to, protein sequences, nomenclature, family classifications, and alternatively-spliced and modified forms. Relevant information on protein function will be included with potential protein interactions, expression patterns, pathways and controlled vocabularies of Gene Ontology (GO terms). Annotation methods applied in the UniProtKB will include extraction of information from the literature and computational analyses, as well as integrating and mining large-scale data sets. The types of evidence and methods of annotation for both experimental and computational data along with attribution of the source will be included. The UniProtKB will rely on high interoperability with other databases, while exploiting novel approaches to encourage community curation. To facilitate the use of UniProt, the UniProt Consortium will enhance its existing user-friendly interfaces and tools to allow for simple and complex queries and for retrieval of large datasets. Database records will be down-loadable in defined, parsable format. An efficient and responsive user support service will be provided. Finally, the UniProt Consortium will exert the flexibility and adaptability needed to respond to changing needs of the scientific community. The broad, long-term objectives of this project are: To provide the scientific community with the Universal Protein Resource (UniProt) as a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. To enable scientists to identify and analyze products of protein-coding genes by making text- and sequence-based queries in the UniProt databases. To provide efficient and unencumbered access to the databases produced by the UniProt Consortium.

Public Health Relevance

The databases produced by the UniProt Consortium will provide researchers with an integrated access to protein sequence and function by gathering and enriching data from genomics and proteomics projects as well as the results published by individual researchers. This is a crucial step in making genomics and proteomics research results easily accessible to support biomedical research in academia and industry and hence facilitate the development of preventive and curative strategies for human health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Biotechnology Resource Cooperative Agreements (U41)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O2))
Program Officer
Bonazzi, Vivien
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
European Molecular Biology Laboratory
Zip Code
Masson, Patrick; Hulo, Chantal; de Castro, Edouard et al. (2014) An integrated ontology resource to explore and study host-virus relationships. PLoS One 9:e108075
Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N et al. (2014) Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database (Oxford) 2014:bau016
Seymour, Sean L; Farrah, Terry; Binz, Pierre-Alain et al. (2014) A standardized framing for reporting protein identifications in mzIdentML 1.2. Proteomics 14:2389-99
Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud et al. (2014) Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 35:927-35
Pavelin, Katrina; Pundir, Sangya; Cham, Jennifer A (2014) Ten simple rules for running interactive workshops. PLoS Comput Biol 10:e1003485
UniProt Consortium (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42:D191-8
Jupp, Simon; Malone, James; Bolleman, Jerven et al. (2014) The EBI RDF platform: linked open data for the life sciences. Bioinformatics 30:1338-9
Huntley, Rachael P; Harris, Midori A; Alam-Faruque, Yasmin et al. (2014) A method for increasing expressivity of Gene Ontology annotations using a compositional approach. BMC Bioinformatics 15:155
Welter, Danielle; MacArthur, Jacqueline; Morales, Joannella et al. (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001-6
Velankar, Sameer; Dana, Jose M; Jacobsen, Julius et al. (2013) SIFTS: Structure Integration with Function, Taxonomy and Sequences resource. Nucleic Acids Res 41:D483-9

Showing the most recent 10 out of 22 publications