The goal of the project is to build powerful new methods of gene identification utilizing the positive experience and ideas of the already proven GeneMark method. There are two important problems that appear in sequencing experiments: prediction of pioneer genes, both in prokaryotes and eukaryotes, and prediction of gene structure in eukaryotic DNA. We plan to develop the self-training program GeneMark-Genesis, that will parallel model learning and gene prediction in newly sequenced prokaryotic genomes, including the classification of identified genes into several classes. We intend to improve the accuracy of the GeneMark method for prokaryotic gene identification by developing a GeneMark-HMM version which will make use of minor statistical patterns. In eukaryotes, the family of GeneMark programs will be extended by a program GeneMark-H, that combines Markov models for protein-coding and non-coding regions with Markov models for splice sites and other boundary sites. Another potentially powerful development is the NetGeneMark program, integrating a quality splice site detection system, NetGene, with prediction of coding potential by GeneMark. All methods are to work for raw DNA analysis and interpretation for which there is an increasing need in the Human Genome project.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG000783-05
Application #
2444957
Study Section
Special Emphasis Panel (ZRG2-GNM (03))
Project Start
1993-03-15
Project End
1999-06-30
Budget Start
1997-07-01
Budget End
1998-06-30
Support Year
5
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Georgia Institute of Technology
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
097394084
City
Atlanta
State
GA
Country
United States
Zip Code
30332
Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun et al. (2018) Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes. Genome Res 28:1079-1089
Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre et al. (2016) BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767-9
Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat et al. (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614-24
Tang, Shiyuyun; Lomsadze, Alexandre; Borodovsky, Mark (2015) Identification of protein coding regions in RNA transcripts. Nucleic Acids Res 43:e78
Wu, G Albert; Prochnik, Simon; Jenkins, Jerry et al. (2014) Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotechnol 32:656-62
Borodovsky, Mark; Lomsadze, Alex (2014) Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite. Curr Protoc Microbiol 32:Unit 1E.7.
Lomsadze, Alexandre; Burns, Paul D; Borodovsky, Mark (2014) Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res 42:e119
Burns, Paul D; Li, Yang; Ma, Jian et al. (2014) UnSplicer: mapping spliced RNA-Seq reads in compact genomes and filtering noisy splicing. Nucleic Acids Res 42:e25
Li, Yang; Li-Byarlay, Hongmei; Burns, Paul et al. (2013) TrueSight: a new algorithm for splice junction detection using RNA-seq. Nucleic Acids Res 41:e51
Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark (2013) GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences. Nucleic Acids Res 41:D152-6

Showing the most recent 10 out of 48 publications