Proposal Title: Collaborative Research: PHANTOME: PHage ANnotation TOols and MEthods

Institution: San Diego State University Foundation

Abstract Date: 03/09/09

Viruses are the most abundant biological entities on the planet. Since the most abundant living organisms on Earth are bacteria, the majority of these viruses are phages, the viruses which infect bacteria. Through their diverse lifestyles and gene products, phages play important roles in horizontal gene exchange, in structuring natural microbial communities, and in global biogeochemical cycles. Phages carry genes for some of the deadliest toxins known and can also carry genes which confer adaptive advantages to the hosts they infect. Furthermore, phage genes and the proteins they encode are the outcome of evolution over eons, the products of which we would be able to exploit if only we could decode the information in the phage DNA sequences. The number of available phage genome sequences is increasing rapidly; on the other hand, they represent the largest global reservoir of uncharacterized genetic material. Bioinformatic tools necessary for interpreting this data has lagged behind the growth in genome sequences. Grants to develop a platform and toolbox of computational tools for phage genome analysis have been awarded to support collaborative research in the laboratories of Drs. Robert Edwards, Department of Computer Sciences, San Diego State University, Mya Breitbart, College of Marine Sciences, University of South Florida, Jeffrey Elhai, Biology Department, Virginia Commonwealth University and Matthew Sullivan, Department of Ecology and Evolutionary Biology, University of Arizona. Dr. Elhai is an Associate Professor, the other three investigators are Assistant Professors. This collaborative project is creating new computational tools to establish a consistent nomenclature for phage genomes, to annotate phage sequences, both from completely sequenced phage genomes and from environmental phage metagenome sequences. Most importantly, this project will engage a wide spectrum of researchers, regardless of their computational background, to access the wealth of information contained in phage genomes through familiar graphical interfaces. These collaborators have developed an extensive and far-reaching education plan that targets high school students, undergraduate students and graduate students. The students trained in the use of the tools will rotate into trainer roles via user forums and workshops. The postdocs will be working across all the labs and thereby gain an unparalleled panoramic view of phage biology.

NATIONAL SCIENCE FOUNDATION Proposal Abstract Proposal:0850356 PI Name:Edwards, Robert Printed from eJacket: 03/10/09 Page 1 of 1

Project Report

Just as notes are not music, neither is information knowledge. Technological advances over the past 20 years have granted us an overwhelming amount of information concerning DNA sequences, information that can tell us how life works, how it fails when it doesn't work, and how it interacts with the environment. However, information by itself doesn't say anything. Humans have to analyze it in creative ways to extract new understanding. A group of researchers at four U.S. universities (San Diego State U., Arizona State U., U. South Florida, and Virginia Commonwealth U.) have collaborated to build tools and underlying infrastructure to enable researchers to navigate through the mass of available genomic information concerning bacterial viruses (phages) to facilitate its translation into knowledge. The overall scope of the joint PhAnToMe project and its relation to the needs of U.S. science and technology are described in the summary provided by Rob Edwards (SDSU). The VCU portion of the project focused on a computer interface, BioBIKE (Biological Integrated Knowledge Environment), connecting the researcher to genomic data and the tools to analyze it. Most researchers rely on prebuilt applications. In this regard, they are no different from most everyone in the general population who interacts with computers. Applications, like highways, are good for getting researchers to well-defined end points, but they are not effective for exploring new terrain. For that, specialized tools must be crafted by researchers in response to the specific scientific problem at hand. Creative exploration requires that researchers not merely use the computer but also direct – program – the computer. Very few biomedical researchers can program computers, and the reason is easy to appreciate. Image 1 shows how a simple problem is solved using a conventional programming language. That language is incomprehensible to most and requires substantial commitment to learn. Compare that sample with two solutions to the same problem using BioBIKE, developed through the PhAnToMe project (Images 2 and 3). These sets of instructions can be understood by someone familiar with genome sequencing, without the need to learn an arcane language. BioBIKE makes use of familiar conventions, e.g. menus and drag-and-drop that enable users to use existing tools and to construct new tools, all through the same graphical interface. Graphical programming languages, in general, have the potential to overload the screen. BioBIKE addresses this problem by allowing users to collapse portions of the screen at will, replacing them with user-supplied labels and expanding them only when needed. Most importantly, BioBIKE functions return results not only in a form that is comprehensible to the user but also in a computer-readable form, so that the results of one function may be used as the input to another. The PhAnToMe database includes over a thousand bacterial and viral genomes, including descriptions of their proteins. Genomes are accessible through menus by their common names. Because BioBIKE provides an environment in which the data is integrated with analytical tools, an important barrier to genome analysis is thereby eliminated: searching for appropriate genomes and converting them to formats that are required for a desired tool. The genomes are already in BioBIKE, and format conversions are done automatically. Almost all genome databases, including PhAnToMe's, are composed primarily of genomes whose gene descriptions have been generated in an automated fashion, and they are famously rife with error. There is usually no way for the user to assess the validity of a description. To address this problem, the BioBIKE interface allows users to modify the database, contributing their own knowledge and the evidence underlying an assertion. During the tenure of the project, BioBIKE has enabled 137 undergraduates to learn the concepts of molecular biology through discovery, using BioBIKE to perform computational experiments that illuminate the nature of genes, proteins, and genomes. The discovery modules have been made publically available for use by any course on molecular biology. Over the course of a single semester, most students went from a lack of any programming experience to sufficient mastery to use BioBIKE to complete an independent research project concerning genome analysis. 14 undergraduates have participated in the development of BioBIKE, seven of them presenting their work at international meetings. BioBIKE is tailored to the needs of specific research and educational communities, those that study the molecular biology and genomes of bacteria and their viruses. Though it provides a special purpose environment, it is also possesses powers of expression no less than most general purpose computer languages. BioBIKE may serve as a model of a human-centered computer interface that empowers members of a coherent community to go beyond superficial confrontations with information and the constraints of premade tools, to bring to bear on scientific problems the creative force of computation.

National Science Foundation (NSF)
Division of Biological Infrastructure (DBI)
Application #
Program Officer
Julie Dickerson
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Virginia Commonwealth University
United States
Zip Code