High accuracy computational methods for biomolecular nuclear magnetic resonance spectroscopy Nuclear magnetic resonance (NMR) spectroscopy is one of the most important condensed phase probes of composition, structure and dynamics of biomolecules and bio-organic species. NMR observables such as chemical shifts and spin-spin splittings can be measured to very high accuracy. Because they are sensitive to the biological functional groups, detailed geometries, and chemical environments, they allow for prediction of solution phase protein structures or to identify or verify the structure of chemical compounds in the crystalline phase. The connection to structure, while true in principle, is nevertheless sometimes difficult to reveal in practice through direct assignment of the spectrum. Simulation methods that accurately predict spectral observables from structure are a key goal for NMR spectral assignment. Such methods are even more crucial for the inverse problem of realizing high quality NMR structures of folded proteins from spectra, and as powerful restraints for determining the structural ensembles of intrinsically disordered proteins (IDPs). Existing approaches to this problem typically rely on semi-empirical heuristics, and while they have achieved considerable success, they also reveal limitations that significantly degrade the quality of structural prediction. In this equipment supplement we are proposing to acquire a dedicated compute cluster for high throughput calculations of wavefunction-based QM methods we have developed for chemical shifts that offer improved accuracy over DFT. This will be employed to populate databases that reflect protein and small molecule drug relevant for machine learning methods we have developed under NIH support. With such data, machine learning and deep networks will determine a quantitative relationship between structure and computed NMR observable, and the resulting data science driven methods will be tested on the refinement of folded proteins and small molecule drug prediction.

Public Health Relevance

NMR shifts and splitting measurements for organic materials containing 1H, 2H, 13C, and 15N nuclei can provide detailed descriptions of the structure of drug molecules, folded proteins and their complexes, as well as the structural ensembles of intrinsically disordered proteins (IDPs). However there are currently limitations of turning accurate NMR measurements into accurate structures that clearly hampers advances that could be made on connecting structure to function of proteins in their native aqueous environments. This supplemental equipment proposal aims to use machine learning on QM data through the purchase of a high-throughput GPU/CPU cluster, and will significantly improve the ability to predict the chemical shifts and indirect spin-spin couplings to yield the spectrum associated with a given folded structure or drug crystal, with very high accuracy, and with good computational efficiency.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project--Cooperative Agreements (U01)
Project #
3U01GM121667-04S1
Application #
10145510
Study Section
Program Officer
Preusch, Peter
Project Start
2017-02-01
Project End
2021-01-31
Budget Start
2020-02-01
Budget End
2021-01-31
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California Berkeley
Department
Chemistry
Type
Schools of Arts and Sciences
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94710
Sasmal, Sukanya; Lincoff, James; Head-Gordon, Teresa (2017) Effect of a Paramagnetic Spin Label on the Intrinsically Disordered Peptide Ensemble of Amyloid-?. Biophys J 113:1002-1011