Streamlined capture and curation of unpublished data

Sternberg, Paul; Schedl, Tim

Abstract

The Streamlined capture and curation of unpublished data project will establish a new data capture and dissemination paradigm that automatically and simultaneously captures and ingests biomedical data into authoritative repositories and publishes them in an online, open access journal `Micropublication: biology'. This new platform will introduce a curation paradigm shift, allowing authors to directly submit the output of their research into pre-designed intelligent web forms. Upon submission, these forms will seamlessly integrate, atomize, and submit metadata into authoritative data repositories enhancing the efficiency and accuracy of curation. Simultaneously, the process will automatically generate a `publication-like' PDF file that will be publishable and citable according to findable, accessible, interoperable and reproducible (FAIR) data principles. We call these single result experiments, streamlined with no narrative ?micropublications?, ideal for among other things, results that often go unpublished. Authors will preserve provenance and establish credit for their research and the automated flow of data they submit will be made publicly accessible in established and authoritative data repositories such as the Model Organism Database (MOD) members of the Allied Genome Resources (AGR): FlyBase, Mouse Genome Database (MGI), Rat Genome Database (RGD), Saccharomyces Genome Database (SGD), WormBase, Zebrafish Model Organism Database (ZFIN), for further re-use. Through the aforementioned repositories, all submitted metadata will automatically be integrated with existing datasets that have been manually extracted from the literature for almost 2 decades. These data will be peer reviewed ensuring they are of high quality and that they meet community standards. Micropublications will be citable, discoverable, and will comply with the Minimum Information Standards for scientific data reporting. In addition, researchers will be able to share both positive and negative data with the scientific community, fulfilling funding agencies' requirements to share all data coming from publicly funded research. After establishing this data retrieval/publication pipeline with WormBase first, and AGR member databases, we will work to expand to non-member, but otherwise critical biomedical model organism databases, such as Xenbase (Xenopus laevis and tropicalis Database), DictyBase (Dictyostelium discoideum database), PomBase (Schizosaccharomyces pombe Database), among others.

Public Health Relevance

Curating and integrating biological and biomedical knowledge into computable publicly available resources is a key step to expedite new discoveries but is costly and time consuming. We will directly incorporate reviewed scientific digital data at the time of production, enabling immediate data transfer to the scientific community, accelerating the pace of scientific discoveries that will ultimately improve health. We will start with data that are otherwise unpublished, including single result novel findings, replications of published experiments and negative data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project--Cooperative Agreements (U01)
Project #: 5U01LM012672-02
Application #: 9534745
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Florance, Valerie

Project Start: 2017-08-01
Project End: 2021-07-31
Budget Start: 2018-08-01
Budget End: 2019-07-31
Support Year: 2
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: California Institute of Technology
Department
Type: Schools of Arts and Sciences
DUNS #: 009584210

City: Pasadena
State: CA
Country: United States
Zip Code: 91125

Related projects


NIH 2020 U01 LM	Streamlined capture and curation of unpublished data Sternberg, Paul Warren; Schedl, Tim / California Institute of Technology
NIH 2019 U01 LM	Streamlined capture and curation of unpublished data Sternberg, Paul Warren; Schedl, Tim / California Institute of Technology
NIH 2018 U01 LM	Streamlined capture and curation of unpublished data Sternberg, Paul Warren; Schedl, Tim / California Institute of Technology
NIH 2017 U01 LM	Streamlined capture and curation of unpublished data Sternberg, Paul Warren; Schedl, Tim / California Institute of Technology

Publications

Raciti, Daniela; Yook, Karen; Harris, Todd W et al. (2018) Micropublication: incentivizing community curation and placing unpublished data into the public domain. Database (Oxford) 2018:

Lee, Raymond Y N; Howe, Kevin L; Harris, Todd W et al. (2018) WormBase 2017: molting into a new stage. Nucleic Acids Res 46:D869-D874

Comments

Be the first to comment on Paul Sternberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: