While standards in reporting of scientific methods are absolutely critical to producing reproducible science, meeting such standards is tedious and difficult. Checklists and instructions are tough to follow often resulting in low and inconsistent compliance. Scientific journals and societies as well as the National Institutes of Health are now actively proposing general guidelines to address reproducibility issues, particularly in the reporting of methods,? ?but? ?the? ?trickier? ?part? ?is? ?to? ?train? ?the? ?biomedical? ?community? ?to? ?use? ?these? ?standards? ?to? ?effectively. To support new standards in methods reporting, especially the RRID standard for Rigor and Transparency of Key Biological Resources, we propose to build Sci-Score a text mining based tool suite to help authors meet the standard. Sci-Score will provide an automated check on compliance with the RRID standard already implemented by over 100 journals including Cell, Journal of Neuroscience, and eLife and other Rigor and transparency standards put forward by the NIH. The innovation behind Sci-score is the provision of a score, which can be obtained by individual investigators or journals. This score reflects an aspect of quality of methods reporting. We posit that the score will serve as a tool that investigators can use to compete with themselves? ?and? ?each? ?other,? ?the? ?way? ?they? ?currently? ?compete? ?on? ?metrics? ?of? ?popularity,? ?i.e.,? ?the? ?H-index. In Phase I of this project and before, our group has successfully developed a text mining algorithm that can detect antibodies, cell lines, organisms and digital resources (all 4 RRID types) and has created a preliminary score. We propose to extend this approach to all research inputs, like chemicals and plasmids that are requested as part of Cell press? STAR Methods (www.cell.com/star-methods)?. We also propose to build a set of algorithms to detect whether authors discuss the major sources of irreproducibility outlined by NIH, including investigator blinding, proper randomization and sufficient reporting of sex and other biological variables. Resource identification along with other quality metrics will be used to score the quality of scientific methods section text. If successful, the tool could be used by editors, reviewers, and investigators to improve the? ?quality? ?of? ?the? ?scientific? ?paper. Our Phase II specific aims include 1) enhancing and hardening the core natural language processing pipelines to recognize a broader range of sentences in near real time; 2) building a set of modular tools that will be provided for different groups of users to take advantage of the text mining capability we develop in aim 1. At the end of Phase II, we should have a commercially viable product that will be able to be licensed to serve the needs? ?of? ?the? ?publishers? ?and? ?the? ?broader? ?research? ?community.
Standards? ?for? ?scientific? ?methods? ?reporting? ?are? ?absolutely? ?critical? ?to? ?producing? ?reproducible? ?science,? ?but? ?meeting such? ?standards? ?is? ?difficult.? ?Checklists? ?and? ?instructions? ?are? ?tough? ?to? ?follow? ?often? ?resulting? ?in? ?low? ?and? ?inconsistent compliance.? ?To? ?support? ?new? ?standards? ?in? ?methods? ?reporting,? ?especially? ?the? ?RRID? ?standard? ?for? ?Rigor? ?and Transparency,? ?we? ?propose? ?to? ?build? ??Sci-Score?? ?a? ?text? ?mining? ?based? ?tool? ?suite? ?to? ?help? ?authors? ?meet? ?the? ?standard. Sci-Score? ?will? ?provide? ?an? ?automated? ?check? ?on? ?compliance? ?with? ?the? ?RRID? ?standard? ?implemented? ?by? ?over? ?100 journals? ?including? ?Cell,? ?Journal? ?of? ?Neuroscience,? ?and? ?eLife.? ?Sci-score? ?provides? ?an? ?automated? ?rating? ?the? ?quality of? ?methods? ?reporting? ?in? ?submitted? ?articles,? ?which? ?provides? ?feedback? ?to? ?authors,? ?reviewers? ?and? ?editors? ?on? ?how to? ?improve? ?compliance? ?with? ?RRIDs? ?and? ?other? ?standards.