Gene Prediction: Markov Models and Complementary Methods

Borodovsky, Mark

Abstract

The goal of the project is to build more accurate and powerful DNA sequence interpretation algorithms utilizing the positive experience and ideas of previously proven GeneMark and GenMark.hmm methods. We plan to improve the quality of gene finding in prokaryotic genomes in terms of reliable and accurate prediction of gene starts and detection of frameshift sequencing errors. We also plan to develop a machine-learning iterative procedure for deriving all necessary models for precise gene prediction/annotation from totally anonymous prokaryotic sequences. For eukaryotic species, we will improve the accuracy of the ab initio method GeneMark.hmm by building more accurate models for splice sites and initiation/termination sites, and we will address the problem of accurately finding intergenic regions with polyadenilation sites and promoters. On the basis of GeneMark.hmm, we plan to develop an integrated gene finding approach by """"""""projecting"""""""" pieces of diverse extrinsic evidence into DNA level, the translating them into DNA patterns and combining these patterns with statistical patterns of DNA coding and non-coding sequence within a generalized HMM model. The most intriguing sources of this additional information are evolutionary conserved regions in DNA sequences of closely related species, functional motifs in protein sequences and protein sequence patterns reflecting three dimensional structural motifs. All these newly developed methods, as well as several others mentioned in the proposal, will deal with anonymous DNA for which interpretation is increasingly needed in the post-genomic era.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG000783-09
Application #: 6536458
Study Section: Genome Study Section (GNM)
Program Officer: Bonazzi, Vivien

Project Start: 1993-03-15
Project End: 2004-07-19
Budget Start: 2002-07-01
Budget End: 2004-07-19
Support Year: 9
Fiscal Year: 2002
Total Cost: $318,645
Indirect Cost

Institution

Name: Georgia Institute of Technology
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 097394084

City: Atlanta
State: GA
Country: United States
Zip Code: 30332

Related projects

Publications

Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun et al. (2018) Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes. Genome Res 28:1079-1089

Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre et al. (2016) BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767-9

Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat et al. (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614-24

Tang, Shiyuyun; Lomsadze, Alexandre; Borodovsky, Mark (2015) Identification of protein coding regions in RNA transcripts. Nucleic Acids Res 43:e78

Wu, G Albert; Prochnik, Simon; Jenkins, Jerry et al. (2014) Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotechnol 32:656-62

Borodovsky, Mark; Lomsadze, Alex (2014) Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite. Curr Protoc Microbiol 32:Unit 1E.7.

Lomsadze, Alexandre; Burns, Paul D; Borodovsky, Mark (2014) Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res 42:e119

Burns, Paul D; Li, Yang; Ma, Jian et al. (2014) UnSplicer: mapping spliced RNA-Seq reads in compact genomes and filtering noisy splicing. Nucleic Acids Res 42:e25

Li, Yang; Li-Byarlay, Hongmei; Burns, Paul et al. (2013) TrueSight: a new algorithm for splice junction detection using RNA-seq. Nucleic Acids Res 41:e51

Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark (2013) GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences. Nucleic Acids Res 41:D152-6

Showing the most recent 10 out of 48 publications

Comments

Be the first to comment on Mark Borodovsky's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: