Protein Sequence Matching

Jernigan, Robert; Kloczkowski, Andrzej

Abstract

Combining information from the vast body of protein sequences within the framework of protein structures enables the deeper comprehension of the complex effects of amino acid substitutions. Compiling the sequence correlations within protein structural domains will lead to better distinguishing between neutral and deleterious changes. Protein structures provide the frameworks for understanding the sequence data, through physical proximity of directly interacting amino acids and in the manifestation of allostery. This will transform sequence matching from a 1-D process to a 3-D process. Due to the rapid advances in sequencing, the large numbers of available genomes now provide hundreds of millions of protein sequences, and similar advances in structural biology now provide 100,000+ protein structures. By combining these data, our preliminary results show that accounting for the pairwise correlations in the sequence for pairs, closely interacting in the protein structures, immediately yields enhanced ability to identify similar structures by means of sequence matching. Other preliminary data show that function identification by sequence matching is also improved. Such improved homolog identification can lead to progress in structure prediction. The overarching goal here is to apply a deep knowledge of protein structure, together with the analyses of the available sequence data, to the important problem of protein sequence matching. We take an entirely new, highly innovative and uniquely multi-faceted approach for this important problem. It is well established that physical factors such as amino acid dense packing, and other physical aspects of structures affect the conservation of amino acids, and these are accounted for in the new approaches taken here to sequence matching. The rationale is that protein structures provide the physical information and the framework for improving sequence matching to incorporate aspects of 3-D structure and allostery into sequence matching. Accounting for protein flexibility and conformational dynamics will further broaden the investigated conformational space, as well as provide a better understanding of the correlations important for sequence evolution. Results from this project will improve the practice of molecular biology, particularly the identification of functions of proteins having no assigned function, and this is certain to have major impacts upon the understanding of evolution. This project will apply innovative new methods for extracting correlations in sequence, structure and dynamics, by datamining of sequences and structures. The novel structure-based approaches will enable major advances in sequence matching that will be implemented and disseminated on new web servers, made available to anyone. The outcomes of the project will enable any scientist to discriminate significantly more effectively between similar and dissimilar sequences. This better discrimination is essential for better function prediction, for the better understanding of evolution, for better identification of non-functional protein mutants, for improved protein design, for medical diagnosis, and for medical practice in the era of individual patient genomes. 1

Public Health Relevance

Individualized medicine will rely on gene sequencing and knowledge in the era of patient genomes; understanding rapidly the differences among various mutant behaviors becomes a critical element for diagnoses and for developing individual therapies. Combining Big Data from protein sequences and structures will computationally enable the understanding of the effects of mutations by means of structure-principled sequence matching. Our project will develop robust new tools for use in precision medicine, and thus will directly and broadly impact public health. 1

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM127701-03
Application #: 9851415
Study Section: Macromolecular Structure and Function D Study Section (MSFD)
Program Officer: Lyster, Peter

Project Start: 2018-02-01
Project End: 2022-01-31
Budget Start: 2020-02-01
Budget End: 2021-01-31
Support Year: 3
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Iowa State University
Department
Type: Organized Research Units
DUNS #: 005309844

City: Ames
State: IA
Country: United States
Zip Code: 50011

Related projects


NIH 2021 R01 GM	Protein Sequence Matching Jernigan, Robert L.; Kloczkowski, Andrzej / Iowa State University
NIH 2020 R01 GM	Protein Sequence Matching Jernigan, Robert L.; Kloczkowski, Andrzej / Iowa State University
NIH 2019 R01 GM	Protein Sequence Matching Jernigan, Robert L.; Kloczkowski, Andrzej / Iowa State University
NIH 2018 R01 GM	Protein Sequence Matching Jernigan, Robert L.; Kloczkowski, Andrzej / Iowa State University
NIH 2018 R01 GM	Protein Sequence Matching Jernigan, Robert L.; Kloczkowski, Andrzej / Iowa State University

Comments

Be the first to comment on Robert Jernigan's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: