The project will provide new technology to accelerate the discovery of protein structures and assemblies from humans and other species. Knowing the 3D structures of proteins has important biomedical implications for the development of protein-based therapies and targeted therapeutic drugs, but the 3D structures of proteins of thousands of important protein types remain unsolved. The project aims to close this gap, based on two recent advances: (1) the rapid development of new DNA sequencing technologies and (2) a recent breakthrough in protein 3D structure prediction using statistical physics and bio-molecular computation. The new structure prediction method, developed by the applicant team, extracts evolutionary residue-residue couplings from multiple sequence alignments, using a maximum entropy method. The team will use the evolutionary couplings as distance constraints to predict the structure of many single domains, of multidomain proteins and of protein complexes, and to map functional sites on known and predicted structures, with potentially broad impact on diverse biological research areas. The team will also aim to aid the development of hybrid computational- experimental technologies for structure determination. For X-ray crystallography, the aim is bridge the gap between the predicted 3D structures and the basin of convergence for molecular replacement, allowing structure determination from a single native data set without the need for anomalous or derivative diffraction datasets. For NMR, the aim is to add evolutionary couplings to NMR-derived backbone and residue-residue distance information and thus reduce experimental effort and/or permit the solution of larger structures. The project is a close collaboration between the Computational Biology Program at Memorial Sloan-Kettering Cancer Center and the Department of Systems Biology at Harvard Medical School. Experimental collaborations with PSI:Biology centers and the international structural genomics effort will aim to implement a more efficient technology for the determination of biomedically relevant protein structures.

Public Health Relevance

Knowing the 3D structures of proteins has important biomedical implications for the development of protein-based therapies and targeted therapeutic drugs. Currently, the 3D structures of proteins of thousands of important protein types remain unsolved. The project will provide new technology, by combining advances in genomic sequencing with statistical physics and biomolecular computation, to accelerate the discovery of protein structures from humans, as well as pathogenic microorganisms, such as bacteria, viruses and fungi.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM106303-02
Application #
8658439
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Edmonds, Charles G
Project Start
2013-05-03
Project End
2017-04-30
Budget Start
2014-05-01
Budget End
2015-04-30
Support Year
2
Fiscal Year
2014
Total Cost
Indirect Cost
City
New York
State
NY
Country
United States
Zip Code
10065
Riesselman, Adam J; Ingraham, John B; Marks, Debora S (2018) Deep generative models of genetic variation capture the effects of mutations. Nat Methods 15:816-822
Hopf, Thomas A; Green, Anna G; Schubert, Benjamin et al. (2018) The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics :
Sjodt, Megan; Brock, Kelly; Dobihal, Genevieve et al. (2018) Structure of the peptidoglycan polymerase RodA resolved by evolutionary coupling analysis. Nature 556:118-121
Zheng, Sanduo; Sham, Lok-To; Rubino, Frederick A et al. (2018) Structure and mutagenic analysis of the lipid II flippase MurJ from Escherichia coli. Proc Natl Acad Sci U S A 115:6709-6714
Maddamsetti, Rohan; Hatcher, Philip J; Green, Anna G et al. (2017) Core Genes Evolve Rapidly in the Long-term Evolution Experiment with Escherichia coli. Genome Biol Evol :
Hopf, Thomas A; Ingraham, John B; Poelwijk, Frank J et al. (2017) Mutation effects predicted from sequence co-variation. Nat Biotechnol 35:128-135
Nicoludis, John M; Vogt, Bennett E; Green, Anna G et al. (2016) Antiparallel protocadherin homodimers use distinct affinity- and specificity-mediating regions in cadherin repeats 1-4. Elife 5:
Weinreb, Caleb; Riesselman, Adam J; Ingraham, John B et al. (2016) 3D RNA and Functional Interactions from Evolutionary Couplings. Cell 165:963-75
Toth-Petroczy, Agnes; Palmedo, Perry; Ingraham, John et al. (2016) Structured States of Disordered Proteins from Genomic Sequences. Cell 167:158-170.e12
Grad, Yonatan H; Harris, Simon R; Kirkcaldy, Robert D et al. (2016) Genomic Epidemiology of Gonococcal Resistance to Extended-Spectrum Cephalosporins, Macrolides, and Fluoroquinolones in the United States, 2000-2013. J Infect Dis 214:1579-1587

Showing the most recent 10 out of 18 publications