During the past year, work on this project focused on three loosely related topics in biological sequence analysis: 1) We added features to our software to annotate all genes and other features in virus sequences, done in collaboration with the group of J. Rodney Brister in our center (National Center for Biotechnology Information). This software is now being used by sequence indexer Linda Yankie to help annotate incoming Norovirus sequence submissions. 2) We continued development of algorithms and software tools to improve the identification of nucleotide sequences that are contaminated by cloning vectors. These tools are currently being applied to correct thousands of contaminated sequences stored in the non-redundant (nr) database of sequences used by researchers world-wide. As of August 2018, we have corrected 11,290 sequences. The tools were made publicly available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. A paper describing this paper was published in the past year: Alejandro A Schaffer, Eric P Nawrocki, Yoon Choi, Paul A Kitts, Ilene Karsch-Mizrachi, Richard McVeigh; VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening, Bioinformatics, Volume 34, Issue 5, 1 March 2018, Pages 755759, https://doi.org/10.1093/bioinformatics/btx669. 3) We added features to our software packages ribotyper and ribosensor for analyzing ribosomal RNA sequences. ribosensor continues to be used within our center (National Center for Biotechnology Information) to evaluate the validity of batch submissions to GenBank that claim to contain at least 5000 structural RNA sequences within any of the 10 categories covered by ribosensor. ribotyper now contains a prototype program ribodbmaker for creating representative BLAST databases of different classes of ribosomal RNAs (e.g. eukaryotic 18S SSU rRNA) for wider use by the community, and for selecting candidate sequences for promotion to the RefSeq database. ribodbmaker is currently being tested internally.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Production Facilities Intramural Research (ZIB)
Project #
1ZIBLM622435-03
Application #
9781339
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2018
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Schäffer, Alejandro A; Nawrocki, Eric P; Choi, Yoon et al. (2018) VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening. Bioinformatics 34:755-759
Hatcher, Eneida L; Zhdanov, Sergey A; Bao, Yiming et al. (2017) Virus Variation Resource - improved response to emergent viral outbreaks. Nucleic Acids Res 45:D482-D490
Zhang, Bin; Mao, Yuntao S; Diermeier, Sarah D et al. (2017) Identification and Characterization of a Class of MALAT1-like Genomic Loci. Cell Rep 19:1723-1738