Orphan genes in pathogenic bacteria PI: Yanbin Yin, Northern Illinois University Almost 3,000 completely finished prokaryotic genomes are available in the GenBank database, ~22,000 more are in draft assembly status and even more are in sequencing pipeline. New computational tools are increasingly demanded to deal with these ever-increasing genomes to gain new biology. One particularly interesting observation when analyzing microbial genomes is that every sequenced genome contains a significant amount of orphan genes (or ORFans) without homologs in other genomes. However no computational tools are currently available for automated genome-wide identification, classification, annotation and presentation of ORFans in bacterial genomes. Recent pan-genome studies of pathogenic strains and their non-pathogenic relatives of many bacterial species suggested that ORFans play a major and significant role in pathogenesis. We believe that there are to-be-discovered associations between ORFans and the well-known pathogenesis agents such as pathogenic islands (PAIs), phages, plasmids and other mobile genetic elements, and that such associations could be revealed by a comparative study of ORFans in pathogenic and non- pathogenic genomes of the same species. Previously we developed a mathematical function to quantitatively score the uniqueness of genes in a genome and applied it to the identification of ORFans in 277 prokaryotic genomes and 1,456 viral genomes. We found that every studied genome contained a significant number of ORFans, although the percentages of ORFans in different species vary considerably. Overall ~14% of prokaryotic genes and ~30% viral genes are ORFans. In this grant proposal we will consider the fact that new genes (ORFans) have been arising continuously during evolution. We will develop a new computer program implementing a new algorithm to not only automatically identify but also classify ORFans into groups of different ages. We will also apply this new program to 8,431 genomes of 195 human pathogenic bacterial species.

Public Health Relevance

The fact that orphan genes (also known as new genes or ORFans) are associated with pathogenic islands and prophages, makes a comparative study of ORFans between pathogenic and non-pathogenic bacterial genomes highly valuable for the understanding of bacterial pathogenesis. This proposal focuses on studying ORFans that occur in pathogenic bacteria infecting humans, which will benefit further studies leading to potential diagnostic markers, drug targets and vaccines for the treatment/prevention of pathogenic diseases.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Prokaryotic Cell and Molecular Biology Study Section (PCMB)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Northern Illinois University
Schools of Arts and Sciences
De Kalb
United States
Zip Code
Huo, Luyang; Zhang, Han; Huo, Xueting et al. (2017) pHMM-tree: phylogeny of profile hidden Markov models. Bioinformatics 33:1093-1095
Ekstrom, Alex; Yin, Yanbin (2016) ORFanFinder: automated identification of taxonomically restricted orphan genes. Bioinformatics 32:2053-5
Hu, Liwei; Taujale, Rahil; Liu, Fang et al. (2016) Draft genome sequence of Talaromyces verruculosus (""Penicillium verruculosum"") strain TS63-9, a fungus with great potential for industrial production of polysaccharide-degrading enzymes. J Biotechnol 219:5-6
Nguyen, Marcus; Ekstrom, Alex; Li, Xueqiong et al. (2015) HGT-Finder: A New Tool for Horizontal Gene Transfer Finding and Application to Aspergillus genomes. Toxins (Basel) 7:4035-53