This award is issued in response to Notice OD-09-060, Recovery Act Administrative Supplements Providing Summer Research Experiences for Students and Science Educators. DESCRIPTION (provided by applicant): With little chance for discovery and decreasing budgets, yet sustained pressure to publish, the unethical practices of duplicate publications and plagiarism are significant. With no robust method to identify existing and potential duplicate scientific articles by editors and reviewers means that this can go unchecked, until now. eTBLAST, a text similarity search tool available to all on the web, has been used to demonstrate that we can detect with high sensitivity and specificity putative duplicate/plagiarized articles by systematically comparing each Medline abstract (or abstract in review) to all other Medline records. We hypothesize that rigorous identification of purveyors of this behavior, the exhaustive tagging of duplicate articles and the availability of a search tool customized for use by editors, reviewers, granting officials, etc. to detect potential problem manuscripts before they are accepted for publication will be a substantial deterrent, ultimately improving the quality of reported science for all. We will address this through the following specific aims: 1) Refine statistical predictors, thresholds, signatures and algorithms to maximize the efficiency by which we can detect putative duplicate and plagiarized articles within Medline. 2) Systematically check every Medline record against every other to develop a public database of questionable articles that have been reviewed/verified manually to assign a probability of duplication. 3) Perform an analysis of trends, rates and any statistically relevant distributions to understand and address root causes for this behavior. 4) Create a secure resource that is available and open to all journals/reviewers, thus enabling them to estimate novelty and probable overlap with previous publications prior to acceptance.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
3R01LM009758-03S1
Application #
8121295
Study Section
Special Emphasis Panel (ZRG1-HOP-S (50))
Program Officer
Ye, Jane
Project Start
2007-09-30
Project End
2010-09-29
Budget Start
2009-12-01
Budget End
2010-09-29
Support Year
3
Fiscal Year
2009
Total Cost
$84,903
Indirect Cost
Name
Virginia Polytechnic Institute and State University
Department
Type
Organized Research Units
DUNS #
003137015
City
Blacksburg
State
VA
Country
United States
Zip Code
24061
Garner, H R (2011) Combating unethical publications with plagiarism detection services. Urol Oncol 29:95-9
McIver, L J; Fondon 3rd, J W; Skinner, M A et al. (2011) Evaluation of microsatellite variation in the 1000 Genomes Project pilot studies is indicative of the quality and utility of the raw data and alignments. Genomics 97:193-9
Galindo, Cristi L; McIver, Lauren J; Tae, Hongseok et al. (2011) Sporadic breast cancer patients' germline DNA exhibit an AT-rich microsatellite signature. Genes Chromosomes Cancer 50:275-83
Errami, Mounir; Sun, Zhaohui; George, Angela C et al. (2010) Identifying duplicate content using statistically improbable phrases. Bioinformatics 26:1453-7
Sun, Zhaohui; Errami, Mounir; Long, Tara et al. (2010) Systematic characterizations of text similarity in full text biomedical publications. PLoS One 5:e12704
Long, Tara C; Errami, Mounir; George, Angela C et al. (2009) Scientific integrity. Responding to possible plagiarism. Science 323:1293-4
Errami, Mounir; Sun, Zhaohui; Long, Tara C et al. (2009) Deja vu: a database of highly similar citations in the scientific literature. Nucleic Acids Res 37:D921-4
Errami, Mounir; Hicks, Justin M; Fisher, Wayne et al. (2008) Deja vu--a study of duplicate citations in Medline. Bioinformatics 24:243-9
Wren, Jonathan D (2008) URL decay in MEDLINE--a 4-year follow-up study. Bioinformatics 24:1381-5
Giles, Cory B; Wren, Jonathan D (2008) Large-scale directional relationship extraction and resolution. BMC Bioinformatics 9 Suppl 9:S11