Proteoforms are key mediators of biological phenotypes. However, there is no systematic way to uniquely identify these chemical entities and no database to catalog proteoforms for future reference and use. To enable the proteoforms to be findable, accessible, interoperable and reusable (FAIR), experimentally verified proteoforms need to be uniquely identified and stored in an open framework for use by the scientific community. If a proteoform is easily recognized and linked to known biological metadata, then future researchers can link their discoveries with previous ones and formulate new hypotheses. This is important to the members of the Consortium for Top-Down Proteomics (CTDP), a non-profit organization established to promote top-down proteomics. Here, we propose to create a scalable, two-tiered informatic framework for the organization and storage of experimentally verified proteoforms. The system will have a central database, which stores a minimal set of information regarding each proteoform, and a flexible framework for creating individual proteoform knowledgebases. Interest from the top-down proteomic community, software developers, and leading bioinformaticians to develop such a resource is high (see 17 Letters of Support). This includes a strong desire from UniProt to use experimentally verified proteoforms to bolster their leading protein knowledgebase. After the granting period is over, we believe that the central database should be community-owned and curated by the CTDP, and the knowledgebase framework should be open-source and maintained by the top-down community. Therefore, this proposal is split between both the development of deliverable software and the expansion of existing community-centered collaborations for software dissemination.
The Specific Aims focus on: 1) Establishing norms for communicating proteoforms. 2) Developing public proteoform databases and the domain-specific proteoform knowledgebase framework and 3). Engaging the scientific community to promote its use. The success of this project is measured through its dissemination. Upon completion of this grant, we will have established and prepared a self-governing body to oversee the development and maintenance of bioinformatic software for the storage and dissemination of experimentally verified proteoforms. This body, managed by the CTDP, will have the initial tools to create public proteoform databases and have a sustainable governance system in compliance with FAIR principles.

Public Health Relevance

Proteoforms are individual modified proteins responsible for the majority of cellular functions, yet no one knows how many proteoforms exist. In this proposal, we seek to create and populate an informatic framework to allow researchers to catalog proteoforms in a robust and durable way. The proposed atlases will make proteoforms findable, accessible, interoperable and reusable (FAIR) by the proteomics community.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21LM013097-01
Application #
9727557
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2019-06-01
Project End
2021-05-31
Budget Start
2019-06-01
Budget End
2020-05-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Northwestern University at Chicago
Department
Type
Organized Research Units
DUNS #
160079455
City
Chicago
State
IL
Country
United States
Zip Code
60611