Given the recent explosion in the number of sequenced genomes and the relative lack of functional information on their contents, annotating the biological functions of all proteins across different genomes represents a major challenge to modern molecular and computational biology. The problem of genome annotation is particularly acute for bacteria; a vast range of commensal and pathogenic bacterial species impact human health, and only computational approaches, when appropriately combined with carefully targeted biochemical experiments, can provide the reliable, high-throughput annotations necessary to understand their physiology. The current approach to computational function prediction is mainly based on transfer from known proteins of similar sequence, which however becomes increasingly unreliable when the homology level is low. Recently, significant progress has been achieved in protein 3D structure prediction as witnessed by the community-wide blind testing experiments, and current state of the art methods can construct correct protein folds for the majority of genome sequences without using close homologous templates. Building on the hypothesis that biological function is more directly associated with 3D structure than sequence, this proposal aims to initiate a paradigm shift from protein structure prediction to structure-based function annotations. Combining expertise from computational biology, microbiology, and structural biology, the PIs will systemically examine the potential and scope of how computational structure models from cutting-edge modeling methods can help provide reliable high-throughput annotations of bacterial genomes, with a particular focus on the difficult targets that cannot be addressed by the existing sequence homology-based approaches. This project is designed to develop and test several cutting-edge approaches for protein function prediction using low-resolution (but correctly folded) models from the structure predictions.
The specific aims i nclude the development of novel structure-based methods for modeling of the protein-ligand binding sites, and enzyme and gene ontologies. The modeling methods and results will be tested by a set of carefully designed experiments, including high-throughput chemical screening and detailed structural-biology based characterizations. At all stages, iterative prediction-to-experiment-to-refinement loops will be established between the experiments and computational annotations to guide the functional modeling method development and advances. The studies of this project will be focused on E. coli K12 strain, for which >10% of the genome remains un-annotated despite a long history of use as a model organism; but the long-term goal is to build up a novel and robust framework which can be used as a resource for reliable function annotations for various other microbial genomes. Compared with current sequence-based approaches, the success of the structure-based pipelines could potentially convert nearly 10 million (or 30%) of the non- or distant-homologous targets in the current genome database into the reliable function annotation regime.

Public Health Relevance

Thousands of different types of bacteria contribute to human health and disease. One of the key challenges in modern biomedicine is leveraging the genomic sequences of these bacteria genomes to understand how the organisms function. This project aims to develop new methods based on computational protein structure prediction and biochemical experiments to annotate bacterial genomes, which should provide critical guidance for new drug discovery that can help improve human health.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Research Project (R01)
Project #
5R01AI134678-03
Application #
9976447
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Shabman, Reed Solomon
Project Start
2018-08-01
Project End
2022-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109