A Pneumocystis Genome Project was funded in 1999 with the goals of creating a physical map, an EST database and the assembled DNA sequence for the fungal pathogen, Pneumocystis carinii (Pc). Pc causes one of the major HIV-associated infections, Pneumocystis pneumonia (PcP). The genomic sequencing of Pc is ongoing: large contiguous stretches of several chromosomes as well as a non-redundant set of -1900 EST's are available for analysis. Strategies for gene finding in Pc and further annotation of gene products need to be developed now to fully utilize the growing body of sequence data. In this complementary project, we propose (in accord with specific goals of PAS-02-046) to develop and apply computational tools for gene prediction and annotation for the Pc genome in order to facilitate the search for novel drug targets and potential drug candidates for PcP. Despite the growing number of fungal genomes that are being sequenced, only few gene prediction programs for fungi have been developed. Due to specific biases in A/T content, intron/exon boundaries, promoter sequences and gene densities as well as large differences in organization and structure between different fungal genomes, a training set of well characterized genes and splicing signals needs to be developed for each distinct genome. Moreover, the applicability of splicing alignments that are commonly used to enhance ab initio gene predictions is limited due to the high percentage of genes that do not share similarity with sequences of known genes. Approximately half of the putative Pc genes have no identified orthologs (a situation similar to other fungi, such as yeast and Neurospora). Therefore, gene finding and annotation in Pc as well as in other fungal genomes represent a significant challenge. In the present proposal, as a first step toward full annotation, software and analysis tools will be developed to identify putative genes in the Pc genome. We will take advantage of the EST database, available genomic sequences, known Pc genes, as well as the expertise of the personnel on this proposal to create an integrated biological-computational strategy for gene finding and annotation in Pc.
Our specific aims are: 1) To build a representative Pc gene database that will identify intron/exon boundaries and other relevant signals; 2) To develop and train Pc-specific gene recognition methods using hierarchical strategy that combines in a novel way advanced pattern recognition approaches such as Support Vector Machines, Hidden Markov Models and adaptable Neural Networks.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21AI055338-01A1
Application #
6696106
Study Section
AIDS and Related Research 8 (AARR)
Program Officer
Duncan, Rory A
Project Start
2003-08-01
Project End
2005-07-31
Budget Start
2003-08-01
Budget End
2004-07-31
Support Year
1
Fiscal Year
2003
Total Cost
$188,184
Indirect Cost
Name
Cincinnati Children's Hospital Medical Center
Department
Type
DUNS #
071284913
City
Cincinnati
State
OH
Country
United States
Zip Code
45229
Howarth, Jack W; Meller, Jarek; Solaro, R John et al. (2007) Phosphorylation-dependent conformational transition of the cardiac specific N-extension of troponin I in cardiac troponin. J Mol Biol 373:706-22
Sinha, Amit U; Meller, Jaroslaw (2007) Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms. BMC Bioinformatics 8:82
Porollo, Aleksey; Meller, Jaroslaw (2007) Prediction-based fingerprints of protein-protein interactions. Proteins 66:630-45
Slaven, Bradley E; Porollo, Aleksey; Sesterhenn, Thomas et al. (2006) Large-scale characterization of introns in the Pneumocystis carinii genome. J Eukaryot Microbiol 53 Suppl 1:S151-3
Cao, Baoqiang; Porollo, Aleksey; Adamczak, Rafal et al. (2006) Enhanced recognition of protein transmembrane domains with prediction-based structural profiles. Bioinformatics 22:303-9
Slaven, Bradley E; Meller, Jaroslaw; Porollo, Aleksey et al. (2006) Draft assembly and annotation of the Pneumocystis carinii genome. J Eukaryot Microbiol 53 Suppl 1:S89-91
Wagner, Michael; Adamczak, Rafal; Porollo, Aleksey et al. (2005) Linear regression models for solvent accessibility prediction in proteins. J Comput Biol 12:355-69
Adamczak, Rafal; Porollo, Aleksey; Meller, Jaroslaw (2004) Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 56:753-67
Czyzyk-Krzeska, Maria F; Meller, Jaroslaw (2004) von Hippel-Lindau tumor suppressor: not only HIF's executioner. Trends Mol Med 10:146-9