High accuracy computational methods for biomolecular nuclear magnetic resonance spectroscopy Nuclear magnetic resonance (NMR) spectroscopy is one of the most important condensed phase probes of composition, structure and dynamics of biomolecules and bio-organic species. NMR observables such as chemical shifts and spin-spin splittings can be measured to very high accuracy. Because they are sensitive to the biological functional groups, detailed geometries, and chemical environments, they allow for prediction of solution phase protein structures or to identify or verify the structure of chemical compounds in the crystalline phase. The connection to structure, while true in principle, is nevertheless sometimes difficult to reveal in practice through direct assignment of the spectrum. Simulation methods that accurately predict spectral observables from structure are a key goal for NMR spectral assignment. Such methods are even more crucial for the inverse problem of realizing high quality NMR structures of folded proteins from spectra, and as powerful restraints for determining the structural ensembles of intrinsically disordered proteins (IDPs). Existing approaches to this problem typically rely on semi-empirical heuristics, and while they have achieved considerable success, they also reveal limitations that significantly degrade the quality of structural prediction. In this equipment supplement we are proposing to acquire a dedicated compute cluster for high throughput calculations of wavefunction-based QM methods we have developed for chemical shifts that offer improved accuracy over DFT. This will be employed to populate databases that reflect protein and small molecule drug relevant for machine learning methods we have developed under NIH support. With such data, machine learning and deep networks will determine a quantitative relationship between structure and computed NMR observable, and the resulting data science driven methods will be tested on the refinement of folded proteins and small molecule drug prediction.
NMR shifts and splitting measurements for organic materials containing 1H, 2H, 13C, and 15N nuclei can provide detailed descriptions of the structure of drug molecules, folded proteins and their complexes, as well as the structural ensembles of intrinsically disordered proteins (IDPs). However there are currently limitations of turning accurate NMR measurements into accurate structures that clearly hampers advances that could be made on connecting structure to function of proteins in their native aqueous environments. This supplemental equipment proposal aims to use machine learning on QM data through the purchase of a high-throughput GPU/CPU cluster, and will significantly improve the ability to predict the chemical shifts and indirect spin-spin couplings to yield the spectrum associated with a given folded structure or drug crystal, with very high accuracy, and with good computational efficiency.
Sasmal, Sukanya; Lincoff, James; Head-Gordon, Teresa (2017) Effect of a Paramagnetic Spin Label on the Intrinsically Disordered Peptide Ensemble of Amyloid-?. Biophys J 113:1002-1011 |