During the past year, we wrote a new version of the program we have been developing over the past several years for viral sequence annotation using models based on RefSeq annotation. The new version, named VADR (Viral Annotation DefineR, https://github.com/nawrockie/vadr) aligns complete input sequences to their nearest RefSeq sequence and uses that alignment to map the RefSeq annotation onto the input sequences. We are preparing a manscript to submit within the next year about viral annotation with VADR. This project is being carried out in collaboration with the groups of J. Rodney Brister and Ilene Mizrachi in our center (National Center for Biotechnology Information). The new version of VADR includes some important improvements over the previous version (called dnaorg_scripts). VADR is a flexible annotation system able to annotate features as short as a single nucleotide because it aligns the full input sequence to a model of the full viral genome, instead of identifying and aligning individual features (e.g. CDS and mature peptides) as dnaorg_scripts did. VADR also supports the annotation of structural RNA features and uses both sequence and structure conservation during its alignment phase. Finally, VADR validates the annotation of protein-coding regions using blastx to ensure their protein coding potential. VADR is now being used by sequence indexers Linda Yankie and Vincent Calhoun to help annotate Norovirus and Dengue virus sequence submissions. Between January 8, 2019 and August 26, 2019, VADR was used to check 100 submissions of Dengue virus sequences comprised of 1507 sequences. Between September 6, 2018 and May 14, 2019, VADR or an earlier version of the software (formerly called dnaorg_scripts) was used to help annotate 69 submissions of Norovirus sequences comprised of 1571 sequences. Between May 31, 2019 and August 27, 2019, VADR was used to automatically screen 41 Norovirus sequence submissions. The program ribosensor, which is part of the ribovore package, is still in use for screening large (more than 2500 sequence) ribosomal RNA submissions, but we did not develop new features or improvements for it this year. We may attempt to resume development of that tool next year.
Schäffer, Alejandro A; Nawrocki, Eric P; Choi, Yoon et al. (2018) VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening. Bioinformatics 34:755-759 |
Hatcher, Eneida L; Zhdanov, Sergey A; Bao, Yiming et al. (2017) Virus Variation Resource - improved response to emergent viral outbreaks. Nucleic Acids Res 45:D482-D490 |
Zhang, Bin; Mao, Yuntao S; Diermeier, Sarah D et al. (2017) Identification and Characterization of a Class of MALAT1-like Genomic Loci. Cell Rep 19:1723-1738 |