While standards in reporting of scientific methods are absolutely critical to producing reproducible science, meeting such standards is difficult. Checklists and instructions are tough to follow often resulting in low andinconsistent compliance. Scientific journals and societies as well as the National Institutes of Health are now activelyproposinggeneralguidelinestoaddressreproducibilityissues,particularlyinthereportingofmethods (e.g., www.cell.com/star-methods), but the trickier part will be to train the biomedical community to usethesestandardstoeffectivelyimprovehowscientificmethodsarecommunicated. Tosupportnewstandardsinmethodsreporting,specificallytheRRIDstandardforRigorandTransparencyof KeyBiologicalResources,weproposetobuildSci-Scoreatextminingbasedtoolsuitetohelpauthorsmeetthe standard. Sci-Score will provide an automated check on compliance with the RRID standard already implementedbyover100journalsincludingCell,JournalofNeuroscience,andeLife.TheinnovationbehindSci- scoreistheprovisionofascore,whichcanbeobtainedbyindividualinvestigators,whichreflectsanumerical validationofthequalityoftheirmethodsreporting.Wepositthatthescorewillserveasatoolthatinvestigators andjournalscanusetocompetewiththemselvesandeachother,orintheveryleastallowthemtoseehow closetheyaretotheaverageinmeetingqualityrequirements. Recently, our group has developed a text mining algorithm that has now been successfully been used to detect software tools and databases from the SciCrunch Registry in published papers. Digital tools are one of four resource types that the RRID standard identifies. We propose to extend this approach to the other types of entities: antibodies, cell lines and model organisms. Resource identification along with other quality metrics twill be used to train an algorithm to score the overall quality of the methods document. If successful, the tool could be used by editors, reviewers, and investigators to improve the number of RRIDs, therefore the quality of descriptors of key biological resources in published papers. This SBIR project will build a set of algorithms similar to the resource finding pipeline and develop it into an industrial robust and reconfigurable software system. Our Phase I specific aims include to 1) creating gold sets of data for each resource type and training a set of algorithms for each resource type; 2) designing and evaluating the scoring system; 3) designing and evaluating a report generating system based on the previous aims. In Phase II, we will develop a scalable backend infrastructure to serve the needs of scientific publishers and research community.
Standardsforscientificmethodsreportingareabsolutelycriticaltoproducingreproduciblescience,butmeeting suchstandardsisdifficult.Checklistsandinstructionsaretoughtofollowoftenresultinginlowandinconsistent compliance.Tosupportnewstandardsinmethodsreporting,specificallytheRRIDstandardforRigorandTransparency,weproposetobuildSci-Score textminingbasedtoolsuitetohelpauthorsmeetthestandard.Sci-Score willprovideanautomatedcheckoncompliancewiththeRRIDstandardimplementedbyover100journalsincludingCell,JournalofNeuroscience,andeLife.Sci-Scorewillprovideascoreratingthequalityof methodsreportinginsubmittedarticles,whichprovidesfeedbacktoauthors,reviewersandeditorsonhowtoimprovecompliancewithRRIDsandotherstandards.