During the past year, work on this project focused on four loosely related topics in biological sequence analysis: 1) Eric Nawrocki completed a project wit the lab of David Spector (Cold Spring Harbor) characterizing which vertebrate genomes contain orthologs of the human noncoding RNA MALAT1 (Metastasis-Associated Lung Adenocarcinoma Transcript 1) A paper about this work was published in Cell Reports. 2) We added features to our software to annotate all genes and other features in virus sequences, done in collaboration with the group of J. Rodney Brister in our Center. A paper partly touching upon this work was published in Nucleic Acids Research. 3) We continued development of algorithms and software tools to improve the identification of nucleotide sequences that are contaminated by cloning vectors. These tools are currently being applied to correct thousands of contaminated sequences stored in the non-redundant (nr) database of sequences used by researchers world-wide. As of August 2017, we have corrected 8,303 sequences. The tools were made publicly available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. A paper has been submitted. 4) We extended our prototype software tool to recognize bacterial 16S rRNA sequences to a much more sophisticated and general tool to recognize 10 classes of structural RNAs. The expanded version of the tool is called ribosensor, and is being used within our center (National Center for Biotechnology Information to evaluate the validity of batch submissions to GenBank that claim to contain at least 5000 structural RNA sequences within any of the 10 categories covered by ribosensor.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Production Facilities Intramural Research (ZIB)
Project #
1ZIBLM622435-02
Application #
9550591
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2017
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Schäffer, Alejandro A; Nawrocki, Eric P; Choi, Yoon et al. (2018) VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening. Bioinformatics 34:755-759
Hatcher, Eneida L; Zhdanov, Sergey A; Bao, Yiming et al. (2017) Virus Variation Resource - improved response to emergent viral outbreaks. Nucleic Acids Res 45:D482-D490
Zhang, Bin; Mao, Yuntao S; Diermeier, Sarah D et al. (2017) Identification and Characterization of a Class of MALAT1-like Genomic Loci. Cell Rep 19:1723-1738