The Streamlined capture and curation of unpublished data project will establish a new data capture and dissemination paradigm that automatically and simultaneously captures and ingests biomedical data into authoritative repositories and publishes them in an online, open access journal `Micropublication: biology'. This new platform will introduce a curation paradigm shift, allowing authors to directly submit the output of their research into pre-designed intelligent web forms. Upon submission, these forms will seamlessly integrate, atomize, and submit metadata into authoritative data repositories enhancing the efficiency and accuracy of curation. Simultaneously, the process will automatically generate a `publication-like' PDF file that will be publishable and citable according to findable, accessible, interoperable and reproducible (FAIR) data principles. We call these single result experiments, streamlined with no narrative ?micropublications?, ideal for among other things, results that often go unpublished. Authors will preserve provenance and establish credit for their research and the automated flow of data they submit will be made publicly accessible in established and authoritative data repositories such as the Model Organism Database (MOD) members of the Allied Genome Resources (AGR): FlyBase, Mouse Genome Database (MGI), Rat Genome Database (RGD), Saccharomyces Genome Database (SGD), WormBase, Zebrafish Model Organism Database (ZFIN), for further re-use. Through the aforementioned repositories, all submitted metadata will automatically be integrated with existing datasets that have been manually extracted from the literature for almost 2 decades. These data will be peer reviewed ensuring they are of high quality and that they meet community standards. Micropublications will be citable, discoverable, and will comply with the Minimum Information Standards for scientific data reporting. In addition, researchers will be able to share both positive and negative data with the scientific community, fulfilling funding agencies' requirements to share all data coming from publicly funded research. After establishing this data retrieval/publication pipeline with WormBase first, and AGR member databases, we will work to expand to non-member, but otherwise critical biomedical model organism databases, such as Xenbase (Xenopus laevis and tropicalis Database), DictyBase (Dictyostelium discoideum database), PomBase (Schizosaccharomyces pombe Database), among others.

Public Health Relevance

Curating and integrating biological and biomedical knowledge into computable publicly available resources is a key step to expedite new discoveries but is costly and time consuming. We will directly incorporate reviewed scientific digital data at the time of production, enabling immediate data transfer to the scientific community, accelerating the pace of scientific discoveries that will ultimately improve health. We will start with data that are otherwise unpublished, including single result novel findings, replications of published experiments and negative data.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Florance, Valerie
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
California Institute of Technology
Schools of Arts and Sciences
United States
Zip Code
Raciti, Daniela; Yook, Karen; Harris, Todd W et al. (2018) Micropublication: incentivizing community curation and placing unpublished data into the public domain. Database (Oxford) 2018:
Lee, Raymond Y N; Howe, Kevin L; Harris, Todd W et al. (2018) WormBase 2017: molting into a new stage. Nucleic Acids Res 46:D869-D874