Most drugs interact with protein molecules to elicit a cellular response. Traditional drug discovery is a laborious and expensive experimental process, so computational approaches to assess protein function and to accelerate the discovery process are in high demand. Virtual drug screening and structure-based drug design represent computational approaches that can be important to the modern drug discovery and development process. Both are reliant on high-resolution tertiary (3D) protein structures and are hampered by the slow and often unsuccessful methods of experimental structure determination. Protein structure prediction is poised to impact human health by accelerating the construction of high-confidence structural models of drug targets and biopharmaceuticals, which will help identify new therapeutic strategies. However, current methods are very limited in their ability to predict high-resolution models, which is preventing broad classes of therapeutics from being discovered. Also, technologies are needed to predict as early as possible if a candidate drug will fail in the development process. With improvements in accuracy, protein structure prediction can be used to lower drug development costs and focus experiments on the most promising drug candidates. DNASTAR recently released NovaFold-a commercial version of the world leading I-TASSER protein folding algorithm (Yang Zhang, U. Michigan) running on a cloud computing platform. Since 2006, I-TASSER has won the biennial Critical Assessment of Protein Structure Prediction (CASP) competition, a blind study where teams worldwide test their tools against unpublished protein structures. The current product is proving useful to the molecular biology community;however, it cannot take advantage of the cloud's extensive parallelization opportunities nor is it adapted to benefit from protein motion calculations, each of which could dramatically improve the accuracy of the program's predictions. We propose to create a massively parallel software pipeline that predicts the highest frequency of high-resolution protein structures that are suitable for drug screening and drug design projects. In Phase I, we will evaluate the best way to use faster, deeper, and more diverse computing techniques to predict more accurate structures. This includes evaluating parallelization techniques to perform at least 100 times more calculations than are performed by the program today and confirming that an increase in prediction accuracy is achievable by using modified structure template scaffolds. In Phase II, we will use protein motion to improve the accelerated sampling technique. Additionally, we will combine that approach with recent Monte Carlo simulation advancements and massive parallelization in a distributed computing environment to enhance the accuracy further. Ultimately, instead of just 14 simulations per protein like the original algorithm, we wil support thousands of interconnected simulations. At the conclusion of this work, we will deliver a cloud-based software product of suitable accuracy to be relied upon for pharmaceutical biosimulation projects.

Public Health Relevance

The biological function of a protein is dictated by its 3D structure;however, structure determination efforts are overwhelmed by the sheer number of newly discovered proteins from next-generation DNA sequencing technologies and a lack of easy to use, affordable tools for determining protein structure. We propose to create a suite of software tools available to all researchers on a cloud computing platform such that any scientist can efficiently, accurately, and cost-effectively predict the 3D structure of any given protein. Ths software will be critical to enhancing human health globally by helping scientists better qualify potential drug targets, improve the understanding of differing drug responses among individuals based on genetic differences, and support the interactive exploration of the effects of genetic variation on protein structure and function.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-K (14))
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dnastar, Inc.
United States
Zip Code