Structure-based functional annotation of microbial genomes

Zhang, Yang

Abstract

Given the recent explosion in the number of sequenced genomes and the relative lack of functional information on their contents, annotating the biological functions of all proteins across different genomes represents a major challenge to modern molecular and computational biology. The problem of genome annotation is particularly acute for bacteria; a vast range of commensal and pathogenic bacterial species impact human health, and only computational approaches, when appropriately combined with carefully targeted biochemical experiments, can provide the reliable, high-throughput annotations necessary to understand their physiology. The current approach to computational function prediction is mainly based on transfer from known proteins of similar sequence, which however becomes increasingly unreliable when the homology level is low. Recently, significant progress has been achieved in protein 3D structure prediction as witnessed by the community-wide blind testing experiments, and current state of the art methods can construct correct protein folds for the majority of genome sequences without using close homologous templates. Building on the hypothesis that biological function is more directly associated with 3D structure than sequence, this proposal aims to initiate a paradigm shift from protein structure prediction to structure-based function annotations. Combining expertise from computational biology, microbiology, and structural biology, the PIs will systemically examine the potential and scope of how computational structure models from cutting-edge modeling methods can help provide reliable high-throughput annotations of bacterial genomes, with a particular focus on the difficult targets that cannot be addressed by the existing sequence homology-based approaches. This project is designed to develop and test several cutting-edge approaches for protein function prediction using low-resolution (but correctly folded) models from the structure predictions.
The specific aims i nclude the development of novel structure-based methods for modeling of the protein-ligand binding sites, and enzyme and gene ontologies. The modeling methods and results will be tested by a set of carefully designed experiments, including high-throughput chemical screening and detailed structural-biology based characterizations. At all stages, iterative prediction-to-experiment-to-refinement loops will be established between the experiments and computational annotations to guide the functional modeling method development and advances. The studies of this project will be focused on E. coli K12 strain, for which >10% of the genome remains un-annotated despite a long history of use as a model organism; but the long-term goal is to build up a novel and robust framework which can be used as a resource for reliable function annotations for various other microbial genomes. Compared with current sequence-based approaches, the success of the structure-based pipelines could potentially convert nearly 10 million (or 30%) of the non- or distant-homologous targets in the current genome database into the reliable function annotation regime.

Public Health Relevance

Thousands of different types of bacteria contribute to human health and disease. One of the key challenges in modern biomedicine is leveraging the genomic sequences of these bacteria genomes to understand how the organisms function. This project aims to develop new methods based on computational protein structure prediction and biochemical experiments to annotate bacterial genomes, which should provide critical guidance for new drug discovery that can help improve human health.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Allergy and Infectious Diseases (NIAID)
Type: Research Project (R01)
Project #: 1R01AI134678-01A1
Application #: 9596521
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brown, Liliana L

Project Start: 2018-08-01
Project End: 2022-07-31
Budget Start: 2018-08-01
Budget End: 2019-07-31
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: University of Michigan Ann Arbor
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 073133571

City: Ann Arbor
State: MI
Country: United States
Zip Code: 48109

Related projects


NIH 2020 R01 AI	Structure-based functional annotation of microbial genomes Zhang, Yang / University of Michigan Ann Arbor
NIH 2019 R01 AI	Structure-based functional annotation of microbial genomes Zhang, Yang / University of Michigan Ann Arbor
NIH 2018 R01 AI	Structure-based functional annotation of microbial genomes Zhang, Yang / University of Michigan Ann Arbor

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: