The goal of the project is to build powerful new methods of gene identification utilizing the positive experience and ideas of the already proven GeneMark method. There are two important problems that appear in sequencing experiments: prediction of pioneer genes, both in prokaryotes and eukaryotes, and prediction of gene structure in eukaryotic DNA. We plan to develop the self-training program GeneMark-Genesis, that will parallel model learning and gene prediction in newly sequenced prokaryotic genomes, including the classification of identified genes into several classes. We intend to improve the accuracy of the GeneMark method for prokaryotic gene identification by developing a GeneMark-HMM version which will make use of minor statistical patterns. In eukaryotes, the family of GeneMark programs will be extended by a program GeneMark-H, that combines Markov models for protein-coding and non-coding regions with Markov models for splice sites and other boundary sites. Another potentially powerful development is the NetGeneMark program, integrating a quality splice site detection system, NetGene, with prediction of coding potential by GeneMark. All methods are to work for raw DNA analysis and interpretation for which there is an increasing need in the Human Genome project.
Showing the most recent 10 out of 48 publications