Improvements and Extensions to the BLAST Algorithms

Altschul, Stephen

Abstract

In collaboration with Nidhi Shah and Mihai Pop of the University of Maryland, I opened a new avenue of investigation for this project. An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. However, it is known that the best BLAST hit may not always correspond to the best taxonomic match. An alternative approach involves phylogenetic methods which take into account alignments and a model of evolution in order to more accurately define the taxonomic origin of sequences. The similarity- search based methods typically run faster than phylogenetic methods and work well when the organisms in the sample are well represented in the database. On the other hand, phylogenetic methods have the capability to identify new organisms in a sample but are computationally quite expensive. We proposed a two-step approach for metagenomic taxon identification; i.e., the use of a rapid method that accurately classifies sequences using a reference database (this is a filtering step) and then the use a more complex phylogenetic method for the sequences that were unclassified in the previous step. We explored whether and when using top BLAST hit(s) yields a correct taxonomic label. We develop a method to detect outliers among BLAST hits in order to separate the phylogenetically most closely related matches from matches to sequences from more distantly related organisms. We used modified BILD (Bayesian Integral Log Odds) scores, a multiple-alignment scoring function, to define the outliers within a subset of top BLAST hits and assign taxonomic labels. We compared the accuracy of our method to the RDP classifier and show that our method yields fewer misclassifications while properly classifying organisms that are not present in the database. Finally, we evaluated the use of our method as a pre- processing step before more expensive phylogenetic analyses (in our case TIPP) in the context of real 16S rRNA datasets. Our experiments demonstrated the potential of our method to be a filtering step before using phylogenetic methods. We completed a paper describing this work, and published it in Algorithms for Molecular Biology. In collaboration with Mihai Pop, I also completed work on an expository article on sequence alignment, and published it in the CRC Handbook of Discrete and Combinatorial Mathematics.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM000072-21
Application #: 9796758
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 21
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2019 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine
NIH 2018 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine
NIH 2017 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine
NIH 2014 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine
NIH 2013 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine	$106,841
NIH 2012 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine	$177,875
NIH 2011 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine	$59,961
NIH 2010 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine	$50,926
NIH 2009 ZIA LM	Improvements and Extensions to the BLAST Algorithms Altschul, Stephen F. / National Library of Medicine	$66,342

Publications

Shah, Nidhi; Altschul, Stephen F; Pop, Mihai (2018) Outlier detection in BLAST hits. Algorithms Mol Biol 13:7

Altschul, Stephen; Demchak, Barry; Durbin, Richard et al. (2013) The anatomy of successful computational biology software. Nat Biotechnol 31:894-7

Boratyn, Grzegorz M; Schaffer, Alejandro A; Agarwala, Richa et al. (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12

Altschul, Stephen F; Gertz, E Michael; Agarwala, Richa et al. (2009) PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res 37:815-24

Comments

Be the first to comment on Stephen Altschul's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: