Recent advances in DNA sequencing techniques have led to the determination of many entire genome sequences. New insights into the biological functions and evolution of these organisms has been gained from this information. A new qualitatively different kind of analysis is possible with complete genome sequence data - that is, the evaluation of apparently missing genes and the potential consequences of their loss on the biology of the organism. To systematically identify potentially missing genes one must first classify genes from a number of organisms into groups of orthologs. Orthologs are genes from different organisms derived from the same gene in the closest common ancestor of these organisms. They are thus the genes most likely to perform biologically similar functions and often share the greatest sequence similarity. Once these classifications are made, one simply examines the phylogenetic pattern in the ortholog groups to identify potentially lost genes in the studied organism as compared to the reference organisms. In addition, global properties of proteins may be studied from a genomic perspective, for example, the relationship of sequence length and conservation. These approaches may also be used to study bacterial and viral pathogens. We will be focussing initially on influenza virus and using complete genome sequence to better understand the epidemiology and natural history of the virus. This understanding may be useful in improving surveillance, formulating vaccines, and preparing for pandemics.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code
Du, Xiangjun; Lipman, David J; Cherry, Joshua L (2013) Why does a protein's evolutionary rate vary over time? Genome Biol Evol 5:494-503
Benson, Dennis A; Cavanaugh, Mark; Clark, Karen et al. (2013) GenBank. Nucleic Acids Res 41:D36-42
Koonin, Eugene V; Landweber, Laura F; Lipman, David J (2013) Biology Direct: celebrating 7 years of open, published peer review. Biol Direct 8:11
Kapustin, Yuri; Souvorov, Alexander; Tatusova, Tatiana et al. (2008) Splign: algorithms for computing spliced alignments with identification of paralogs. Biol Direct 3:20
Bao, Yiming; Bolotov, Pavel; Dernovoy, Dmitry et al. (2008) The influenza virus resource at the National Center for Biotechnology Information. J Virol 82:596-601
Przytycka, Teresa M; Jothi, Raja; Aravind, L et al. (2008) Differences in evolutionary pressure acting within highly conserved ortholog groups. BMC Evol Biol 8:208
Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J et al. (2008) GenBank. Nucleic Acids Res 36:D25-30
Wheeler, David L; Barrett, Tanya; Benson, Dennis A et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36:D13-21