The goal of this project is bring to eukaryotic gene finding a paradigm of self-training successfully developed previously by our group for prokaryotic gene finding algorithms GeneMark and GeneMark.hmm. This objective is of the outmost practical importance given the growing number of genomes of eukaryotic organisms that undergo large scale sequencing efforts. Better understanding of genome based biology of these organisms, including human, has critical importance for studying, controlling and preventing human diseases. We plan to develop an ab initio algorithm for gene identification for anonymous eukaryotic genomes with gene models built by an unsupervised learning procedure. Also we plan to develop an extension of this algorithm integrating existing extrinsic information (cDNA, protein sequence data) into the gene identification and self-training procedures. Finally we plan to develop novel algorithmic tools for generating additional extrinsic information from available sequence data. These tools will be useful for both the self-trained gene finding algorithms as well as for the characterization of newly predicted proteins. ? ?

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG000783-12
Application #
7120163
Study Section
Special Emphasis Panel (ZRG1-BDMA (01))
Program Officer
Bonazzi, Vivien
Project Start
1993-03-15
Project End
2009-04-23
Budget Start
2006-09-01
Budget End
2009-04-23
Support Year
12
Fiscal Year
2006
Total Cost
$363,022
Indirect Cost
Name
Georgia Institute of Technology
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
097394084
City
Atlanta
State
GA
Country
United States
Zip Code
30332
Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun et al. (2018) Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes. Genome Res 28:1079-1089
Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre et al. (2016) BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767-9
Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat et al. (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614-24
Tang, Shiyuyun; Lomsadze, Alexandre; Borodovsky, Mark (2015) Identification of protein coding regions in RNA transcripts. Nucleic Acids Res 43:e78
Wu, G Albert; Prochnik, Simon; Jenkins, Jerry et al. (2014) Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotechnol 32:656-62
Borodovsky, Mark; Lomsadze, Alex (2014) Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite. Curr Protoc Microbiol 32:Unit 1E.7.
Lomsadze, Alexandre; Burns, Paul D; Borodovsky, Mark (2014) Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res 42:e119
Burns, Paul D; Li, Yang; Ma, Jian et al. (2014) UnSplicer: mapping spliced RNA-Seq reads in compact genomes and filtering noisy splicing. Nucleic Acids Res 42:e25
Li, Yang; Li-Byarlay, Hongmei; Burns, Paul et al. (2013) TrueSight: a new algorithm for splice junction detection using RNA-seq. Nucleic Acids Res 41:e51
Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark (2013) GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences. Nucleic Acids Res 41:D152-6

Showing the most recent 10 out of 48 publications