Exploring the unknown protein universe using evolutionary information

Ovchinnikov, Sergey

Abstract

For billions of years, nature has been conducting the greatest experiment of all time. Imagine one day gaining access to the detailed notes from these experiments. Today, with worldwide expeditions to collect samples from all habitats, single-cell sequencing of unculturable microbes and the rapid drop in sequencing costs, we can finally tap into nature and gain access to these notes. All that is missing is a Rosetta Stone to interpret this data. The traditional approach, to interpreting sequence data, is through comparison to known information, such as annotated genomes and/or experimentally characterized protein families. Unfortunately, nearly half of metagenomic data (coming from either environmental samples or microbiomes) lacks any detectable sequence homology to any protein family, let alone to any isolated genome. Furthermore, the rate at which this ?dark matter? is discovered, far exceeds the rate at which experiments can be done to characterize it. An alternative approach is to learn a generative, statistical model of the evolutionary process itself. The parameters of this model should in turn provide the constraints on natural selection. For protein-coding genes, the constraints includes folding, stability, and function. Recently, it was shown that a global statistical model of a protein family that captures both conservation and coevolution patterns in the family possesses this quality. The strength of coevolution term is correlated with residue-residue contacts in 3D structure. These contacts have since been used to computationally determine the 3D structures of hundreds of unknown protein families and complexes. These in turn, have been used to predict the function by looking at arrangement of conserved residues and structural similarity to known protein structures. Structural matches can occur in the absence of detectable sequence similarity because structural similarity is retained over larger evolutionary distances. I propose to 1) Develop an improved, unified, statistical model of protein evolution that takes into account functional and lineage constraints; 2) Apply the model to mine metagenomic ?dark matter? sequences for new protein families, functions and protein-protein interactions; 3) Probe evolution of multicellularity through comparison of structures and interactions in the early tree of life. One of the results of the research will be a public database of new protein families and their predicted 3D structure and function. These will be used by structural, molecular and evolutionary biologists as a reference for future studies into the unknown protein universe.

Public Health Relevance

The goal of the proposed research is to develop new computational tools for analysis of genomic and metagenomic data from environmental and microbiome samples (such as from human gut and other organs). One of the results of the research will be a public database of new protein families, including their predicted 3D structure and function, to be used for future studies into the unknown protein universe. For public health, it is important to characterize these proteins for both potential therapeutic purposes and as possible drug targets involved in disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Office of The Director, National Institutes of Health (OD)
Type: Early Independence Award (DP5)
Project #: 5DP5OD026389-03
Application #: 9990892
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Miller, Becky

Project Start: 2018-09-07
Project End: 2023-08-31
Budget Start: 2020-09-01
Budget End: 2021-08-31
Support Year: 3
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Harvard University
Department
Type: Schools of Arts and Sciences
DUNS #: 082359691

City: Cambridge
State: MA
Country: United States
Zip Code: 02138

Related projects


NIH 2020 DP5 OD	Exploring the unknown protein universe using evolutionary information Ovchinnikov, Sergey L. / Harvard University
NIH 2019 DP5 OD	Exploring the unknown protein universe using evolutionary information Ovchinnikov, Sergey L. / Harvard University
NIH 2018 DP5 OD	Exploring the unknown protein universe using evolutionary information Ovchinnikov, Sergey L. / Harvard University

Comments

Be the first to comment on Sergey Ovchinnikov's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: