The goal of this project is bring to eukaryotic gene finding a paradigm of self-training successfully developed previously by our group for prokaryotic gene finding algorithms GeneMark and GeneMark.hmm. This objective is of the outmost practical importance given the growing number of genomes of eukaryotic organisms that undergo large scale sequencing efforts. Better understanding of genome based biology of these organisms, including human, has critical importance for studying, controlling and preventing human diseases. We plan to develop an ab initio algorithm for gene identification for anonymous eukaryotic genomes with gene models built by an unsupervised learning procedure. Also we plan to develop an extension of this algorithm integrating existing extrinsic information (cDNA, protein sequence data) into the gene identification and self-training procedures. Finally we plan to develop novel algorithmic tools for generating additional extrinsic information from available sequence data. These tools will be useful for both the self-trained gene finding algorithms as well as for the characterization of newly predicted proteins. ? ?
Showing the most recent 10 out of 48 publications