Addressing Open Challenges of Computational Genome Annotation

Borodovsky, Mark

Abstract

We propose to capitalize on success of ongoing collaboration between the bioinformatics teams at the University of Greifswald (Germany) and at the Georgia Institute of Technology (USA) and address open challenges in computational genome annotation. In the course of this development, we plan to implement new algorithmic ideas and satisfy the needs of unbiased integration of different types of OMICS data. We plan to address one of the long-standing problems at interface of bioinformatics and machine learning ? automatic generative and discriminative parameterization of gene finding algorithms. Current methods of combining OMICS evidence frequently result in under predicting or over predicting tools. Having good understanding of the difficulties and the properties of different types of OMICS evidence we propose an optimized approach to the full unsupervised, generative and discriminative training. We will introduce novel means to optimize integration of multiple OMICS evidence into gene prediction. These ideas will develop further the protein family-based gene finding implemented in AUGUSTUS-PPX. We propose to create representations of protein families for gene finding that for the first time include cross-species gene structure information. We will develop a new approach that will unify two advanced research areas - transcript reconstruction from RNA-Seq and statistical gene finding that integrates RNA-Seq and homology information. We will describe a new, comprehensive model and EM-like algorithmic technique (the ?wholistic? approach) to identify the sets of transcripts and their expression levels that best fit the available OMICS evidence. We will also develop an automatic gene-finding algorithm for a full content of metagenomes including eukaryotic and viral metagenomic sequences. This task is conventionally considered too challenging. We propose a solution exploiting and advancing algorithmic ideas and approaches that we mastered in the course of creating gene finders for prokaryotic metagenomes as well as eukaryotic genomes. All new tools will be available to the community under open source licenses.

Public Health Relevance

The goal of this project is to advance the science of genome interpretation by developing much needed computational methods and tools for high precision annotation of eukaryotic genomes and metagenomes. This advance will make an impact in research on model and non-model organisms including important human pathogens, parasites and viruses. New high throughput technologies generate volumes of sequence data on complex genomes as well as metagenomes. Still these big data volumes have to be transformed into scientific knowledge. Our new bioinformatics tools, matching the latest sequencing technology in speed and performance, will make a significant impact in genomic research aiming at ultimate understanding of human health and disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM128145-02
Application #: 9761554
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Ravichandran, Veerasamy

Project Start: 2018-09-01
Project End: 2021-06-30
Budget Start: 2019-07-01
Budget End: 2020-06-30
Support Year: 2
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: Georgia Institute of Technology
Department: Engineering (All Types)
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 097394084

City: Atlanta
State: GA
Country: United States
Zip Code: 30332

Related projects


NIH 2020 R01 GM	Addressing Open Challenges of Computational Genome Annotation Borodovsky, Mark / Georgia Institute of Technology
NIH 2019 R01 GM	Addressing Open Challenges of Computational Genome Annotation Borodovsky, Mark / Georgia Institute of Technology
NIH 2018 R01 GM	Addressing Open Challenges of Computational Genome Annotation Borodovsky, Mark / Georgia Institute of Technology

Comments

Be the first to comment on Mark Borodovsky's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: