HMMER and Infernal: Finding distant homologs of sequences and RNA structures

Eddy, Sean

Abstract

Fast and sensitive sequence homology searches are fundamental tools in molecular biology. Our understanding of the human genome sequence depends in part on comparative sequence analysis of more experimentally ac- cessible model organisms, and indeed on sequence comparisons across the tree of life. This proposal describes a plan to support two software packages for sequence homology search and alignment, HMMER and Infernal. HMMER is for protein and DNA sequence comparison, and it underlies many protein domain family databases and many genome sequence annotation procedures. Infernal is for RNA secondary structure/sequence com- parison, and it is the foundation of various RNA structure/sequence analysis tools including the Rfam database of RNA families. Recent developments ? including a new collaboration with the EMBL European Bioinformatics Institute to provide HMMER web servers, an upcoming HMMER4 release with new memory-ef?cient algorithms, and an expansion of the development teams to multiple universities and sites ? suggest that beyond their current niches in genome analysis, both software packages are in a position to increase the scope and importance of their applications. To improve the foundation of software engineering in these packages, the proposal has three speci?c aims for improving speed, scaling, and support. The ?rst aim focuses on speed improvements, especially in paral- lelization, both on typical desktop computers and on high performance computing resources. A measurable and important milestone of this aim is to make sequence homology searches run at interactive speeds (less than 1 second response time), the speed of a Google search, which could radically change the way biologists interact with sequence data.
The second aim focuses on scaling improvements. Biological sequence data are growing exponentially, and we will make sure that the software can handle ? and help biologists visualize ? very large numbers of signi?cant homologs, up to millions and more.
The third aim focuses on improving support for the software, especially in improving our ability to engage a wider community of academic and industry developers who contribute to our codebases, and who use parts of our codebases in their own work.

Public Health Relevance

Interpreting the human genome sequence ? or any other genome sequence ? depends in part on recognizing evolutionarily related genes across the tree of life, especially in experimentally accessible model organisms. Computational tools for fast and sensitive sequence comparison are fundamental, and the exponentially growing scale of biological sequence data makes it essential that these computational tools are well engineered and highly ef?cient. This proposal describes a plan to support engineering of two widely used software packages: HMMER, for protein and DNA sequence comparisons, and Infernal, for RNA sequence/structure comparison.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG009116-04
Application #: 9736760
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Sen, Shurjo Kumar

Project Start: 2016-09-16
Project End: 2020-06-30
Budget Start: 2019-07-01
Budget End: 2020-06-30
Support Year: 4
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: Harvard University
Department: Microbiology/Immun/Virology
Type: Schools of Arts and Sciences
DUNS #: 082359691

City: Cambridge
State: MA
Country: United States
Zip Code: 02138

Related projects


NIH 2019 R01 HG	HMMER and Infernal: Finding distant homologs of sequences and RNA structures Eddy, Sean R. / Harvard University
NIH 2018 R01 HG	HMMER and Infernal: Finding distant homologs of sequences and RNA structures Eddy, Sean R. / Harvard University
NIH 2017 R01 HG	HMMER and Infernal: Finding distant homologs of sequences and RNA structures Eddy, Sean R. / Harvard University
NIH 2016 R01 HG	HMMER and Infernal: Finding distant homologs of sequences and RNA structures Eddy, Sean R. / Harvard University	$422,500

Publications

Potter, Simon C; Luciani, Aurélien; Eddy, Sean R et al. (2018) HMMER web server: 2018 update. Nucleic Acids Res 46:W200-W204

Nawrocki, Eric P; Jones, Thomas A; Eddy, Sean R (2018) Group I introns are widespread in archaea. Nucleic Acids Res 46:7970-7976

Kalvari, Ioanna; Argasinska, Joanna; Quinones-Olvera, Natalia et al. (2018) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335-D342

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: