Dr. Gonzalez-Delgado set up a semi-automated pipeline using the alignment program HMMer and the Pfam protein domain database, to identify the functionality of viral proteins. The pipeline takes the genome of a human virus, extracts the annotated genes, and runs HMMer to find the domains composing a viral protein. Once statistically significant Pfam domains have been found, Pfam is again queried, to discover which of the proteins contributing to a given domain are human. The human proteins become candidates for a previous horizontal gene transfer between the virus and its human host. To inform the project with questions relevant to experimental virologists, Dr. Gonzalez-Delgado consulted with Drs. Devico and Lewis, human immunodeficiency virus (HIV) experts at the Institute of Human Virology. The project has resulted in a submitted manuscript examining human herpes virus 8 (HHV-8) as a model virus, where HGT with the human genome are relatively well characterized. HHV-8 shares at least 36% of its genes with the human host. To compare the frequency of HGT across viruses, the manuscript surveyed 10 other viruses impacting human health, concluding that HGT probably occurs between humans and both DNA and RNA viruses, in viral genomes of differing sizes, regardless of DNA transcription strategies. Of special note are the Human T-lymphotropic viruses, where genes involved in HGT possibly have a frequency as high as 73%. Dr. Gonzalez-Delgado is now analyzing sequences of the simian immunodeficiency virus, which infects macaques. The SIV-macaque system provides an animal model for HIV infection, to answer a question posed by Drs. Lewis and Devico: are the sequences of the infecting and non-infecting HIV particles in a viral inoculum statistically distinguishable? The founder viruses of the SIV infection appear to differ significantly from the typical virions in the stock inoculum, and the statistical differences have been assigned to specific residues in the viral attachment protein.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code