To predict the three-dimensional structures of a protein solely from its primary sequence remains a grand and elusive challenge in modern computational biology. Molecular dynamics simulation has a high promise for predicting protein structures and folding pathways at molecular details. Recent advances in im- proved computer hardware and enhanced sampling methods have made it possible to ab initio fold proteins of larger size. The highlight of the improved computer hardware is Anton, a massively parallel special-purpose supercomputer designed by D.E. Shaw Research. Anton successfully folded the D14A fast-folding mutant of the 80-residue l-repressor, which was achieved at 49 microseconds (?s) in 643?s-long simulations. On the other hand, the latest advance in enhanced sampling methods is represented by the single-copy continuous simulated tempering (CST) method developed by the PI?s group. The group of Dr. Klaus Schulten incorpo- rated the CST method into the NAMD package, which repeatedly folded the 80-residue l-repressor HG mutant from a fully extended conformation to the native state at 0.5 and 4?s in 10?s-long simulations with Ca root- mean-square deviations (Ca-RMSD) of 1.7 on a conventional computing platform. In marked contrast, a complete folding of the same protein was NOT observed using Anton at multiple temperatures even in 100?s- long simulations. This performance of CST in folding simulation has never been matched by any other sam- pling method for similar purposes on conventional computing platforms. Most recently, to further enhance sampling efficiencies in studying larger systems, the PI has developed a more powerful parallel CST (PCST) method. Initial ab initio folding simulation of trp-cage clearly demonstrated that the efficiency of PCST in facili- tating multiple folding and unfolding events was even drastically superior to that of CST. The PCST method serves as a solid foundation for the proposed research in three Specific Aims: 1). Development of the PCST method for enhanced sampling; 2). Design of advanced temperature-dependent restraint schemes for targeted sampling; 3). Development of advanced blind model selection methods for efficient target se- lection. Our in-depth preliminary studies demonstrate that these new methods clearly outperformed all exist- ing methods and suggest a high promise of success for the proposed research. Ultimately, these powerful new algorithms will provide urgently-needed tools for protein simulations, and offer an effective solution for structural refinement in experimental X-ray crystallography and electron cryo-microscopy.

Public Health Relevance

To predict the three-dimensional structures of a protein solely from its primary sequence remains a grand and elusive challenge in modern computational biology. The proposed study aims to develop a set of computational tools to bring us closer toward this goal. The implementation and release of these computation- al methods to the entire scientific community will expedite the pursuits for high-accuracy structures of biomedi- cally important proteins, thus directly benefiting disease prevention and treatment.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Baylor College of Medicine
Schools of Medicine
United States
Zip Code
Du, Junqing; Kirk, Brian; Zeng, Jia et al. (2018) Three classes of response elements for human PRC2 and MLL1/2-Trithorax complexes. Nucleic Acids Res 46:8848-8864
Lin, Xingcheng; Noel, Jeffrey K; Wang, Qinghua et al. (2018) Atomistic simulations indicate the functional loop-to-coiled-coil transition in influenza hemagglutinin is not downhill. Proc Natl Acad Sci U S A 115:E7905-E7913