Orphan genes in pathogenic bacteria PI: Yanbin Yin, Northern Illinois University Almost 3,000 completely finished prokaryotic genomes are available in the GenBank database, ~22,000 more are in draft assembly status and even more are in sequencing pipeline. New computational tools are increasingly demanded to deal with these ever-increasing genomes to gain new biology. One particularly interesting observation when analyzing microbial genomes is that every sequenced genome contains a significant amount of orphan genes (or ORFans) without homologs in other genomes. However no computational tools are currently available for automated genome-wide identification, classification, annotation and presentation of ORFans in bacterial genomes. Recent pan-genome studies of pathogenic strains and their non-pathogenic relatives of many bacterial species suggested that ORFans play a major and significant role in pathogenesis. We believe that there are to-be-discovered associations between ORFans and the well-known pathogenesis agents such as pathogenic islands (PAIs), phages, plasmids and other mobile genetic elements, and that such associations could be revealed by a comparative study of ORFans in pathogenic and non- pathogenic genomes of the same species. Previously we developed a mathematical function to quantitatively score the uniqueness of genes in a genome and applied it to the identification of ORFans in 277 prokaryotic genomes and 1,456 viral genomes. We found that every studied genome contained a significant number of ORFans, although the percentages of ORFans in different species vary considerably. Overall ~14% of prokaryotic genes and ~30% viral genes are ORFans. In this grant proposal we will consider the fact that new genes (ORFans) have been arising continuously during evolution. We will develop a new computer program implementing a new algorithm to not only automatically identify but also classify ORFans into groups of different ages. We will also apply this new program to 8,431 genomes of 195 human pathogenic bacterial species.

Public Health Relevance

The fact that orphan genes (also known as new genes or ORFans) are associated with pathogenic islands and prophages, makes a comparative study of ORFans between pathogenic and non-pathogenic bacterial genomes highly valuable for the understanding of bacterial pathogenesis. This proposal focuses on studying ORFans that occur in pathogenic bacteria infecting humans, which will benefit further studies leading to potential diagnostic markers, drug targets and vaccines for the treatment/prevention of pathogenic diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15GM114706-01
Application #
8878703
Study Section
Prokaryotic Cell and Molecular Biology Study Section (PCMB)
Program Officer
Brazhnik, Paul
Project Start
2015-03-01
Project End
2018-02-28
Budget Start
2015-03-01
Budget End
2018-02-28
Support Year
1
Fiscal Year
2015
Total Cost
$373,400
Indirect Cost
$103,450
Name
Northern Illinois University
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
001745512
City
De Kalb
State
IL
Country
United States
Zip Code
60115
Huo, Luyang; Zhang, Han; Huo, Xueting et al. (2017) pHMM-tree: phylogeny of profile hidden Markov models. Bioinformatics 33:1093-1095
Ekstrom, Alex; Yin, Yanbin (2016) ORFanFinder: automated identification of taxonomically restricted orphan genes. Bioinformatics 32:2053-5
Hu, Liwei; Taujale, Rahil; Liu, Fang et al. (2016) Draft genome sequence of Talaromyces verruculosus (""Penicillium verruculosum"") strain TS63-9, a fungus with great potential for industrial production of polysaccharide-degrading enzymes. J Biotechnol 219:5-6
Nguyen, Marcus; Ekstrom, Alex; Li, Xueqiong et al. (2015) HGT-Finder: A New Tool for Horizontal Gene Transfer Finding and Application to Aspergillus genomes. Toxins (Basel) 7:4035-53