Proteins are the workhorse molecules of life which participate in nearly every activity of cellular processes, including signal transduction, enzyme catalysis, structural support, bodily movement, and defense against pathogens. Interpretation of specific functional roles that each protein molecule plays in cell is thus critical for us to understand the fundamental principles of the biological processes and to design new drug treatments to regulate the processes for improving human health. The task is however highly non-trivial in modern molecular biology studies. The most accurate method to interpret protein biological functions is through structural biology and biochemistry experiments. But the cost of the experimental studies is high, and the process is too slow for large-scale application due to the involvement of manual skill and data processing. As a result, the majority of proteins in human and other important species remain unknown despite decades of efforts. The lack of genome-wide protein function information has significantly impeded the progress of system biology studies aiming at a comprehensive understanding of the life process. In this project, the investigators plan to develop advanced computational methods for automatic and yet reliable protein function annotations. The developed methods and databases will be freely released to the scientific community, which can be used for large-scale and genome-wide protein function annotation studies. The project will also provide opportunities to promote participations of underrepresented groups, including women and African Americans, in computational biology education and method developments.

Built on the assumption that similar sequences have similar function, a routine approach to computational protein function annotations is comparative modeling, which deduces functions of target proteins from known homologous proteins. However, the accuracy and coverage of the approach are limited due to the diversity of gene evolution. Significant progress has been recently achieved in protein 3D structure prediction and the state-of-the-art algorithms can generate high-quality structures for distant-homology proteins with an unprecedented capacity. This project seeks to explore various new ideas to enhance the accuracy of distant-homology protein function annotations by using 3D models from the cutting-edge protein structure predictions, with a focus on ligand-protein binding interactions, gene ontology and post-translational modifications. Meanwhile, thermal motion and intrinsic disordering of protein structures are integrated in the pipelines for better function annotations. While the proposed approaches do not expect to address all the fundamental issues, like the first-principle methods, as of how and why proteins fold and function, the success of the studies should help establish a practical knowledge-based relation of structure and function that can be used for genome-scale applications with models useful for guiding new experimental design, and thus significantly enhance the impact of protein structure modeling on biological studies.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
2003019
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2019-09-25
Budget End
2023-07-31
Support Year
Fiscal Year
2020
Total Cost
$150,016
Indirect Cost
Name
Wichita State University
Department
Type
DUNS #
City
Wichita
State
KS
Country
United States
Zip Code
67260