Most drugs interact with protein molecules to elicit a cellular response. Traditional drug discovery is a laborious and expensive experimental process, so computational approaches to assess protein function and to accelerate the discovery process are in high demand. Virtual drug screening and structure-based drug design represent computational approaches that can be important to the modern drug discovery and development process. Both are reliant on high-resolution tertiary (3D) protein structures and are hampered by the slow and often unsuccessful methods of experimental structure determination. Protein structure prediction is poised to impact human health by accelerating the construction of high-confidence structural models of drug targets and biopharmaceuticals, which will help identify new therapeutic strategies. However, current methods are very limited in their ability to predict high-resolution models, which is preventing broad classes of therapeutics from being discovered. Also, technologies are needed to predict as early as possible if a candidate drug will fail in the development process. With improvements in accuracy, protein structure prediction can be used to lower drug development costs and focus experiments on the most promising drug candidates. DNASTAR recently released NovaFold-a commercial version of the world leading I-TASSER protein folding algorithm (Yang Zhang, U. Michigan) running on a cloud computing platform. Since 2006, I-TASSER has won the biennial Critical Assessment of Protein Structure Prediction (CASP) competition, a blind study where teams worldwide test their tools against unpublished protein structures. The current product is proving useful to the molecular biology community; however, it cannot take advantage of the cloud's extensive parallelization opportunities nor is it adapted to benefit from protein motion calculations, each of which could dramatically improve the accuracy of the program's predictions. We propose to create a massively parallel software pipeline that predicts the highest frequency of high-resolution protein structures that are suitable for drug screening and drug design projects. In Phase I, we will evaluate the best way to use faster, deeper, and more diverse computing techniques to predict more accurate structures. This includes evaluating parallelization techniques to perform at least 100 times more calculations than are performed by the program today and confirming that an increase in prediction accuracy is achievable by using modified structure template scaffolds. In Phase II, we will use protein motion to improve the accelerated sampling technique. Additionally, we will combine that approach with recent Monte Carlo simulation advancements and massive parallelization in a distributed computing environment to enhance the accuracy further. Ultimately, instead of just 14 simulations per protein like the original algorithm, we wil support thousands of interconnected simulations. At the conclusion of this work, we will deliver a cloud-based software product of suitable accuracy to be relied upon for pharmaceutical biosimulation projects.

Public Health Relevance

The biological function of a protein is dictated by its 3D structure; however, structure determination efforts are overwhelmed by the sheer number of newly discovered proteins from next-generation DNA sequencing technologies and a lack of easy to use, affordable tools for determining protein structure. We propose to create a suite of software tools available to all researchers on a cloud computing platform such that any scientist can efficiently, accurately, and cost-effectively predict the 3D structure of any given protein. Ths software will be critical to enhancing human health globally by helping scientists better qualify potential drug targets, improve the understanding of differing drug responses among individuals based on genetic differences, and support the interactive exploration of the effects of genetic variation on protein structure and function.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
4R44GM110814-02
Application #
8931346
Study Section
Special Emphasis Panel (ZRG1-IMST-K (14))
Program Officer
Wehrle, Janna P
Project Start
2014-06-10
Project End
2016-12-31
Budget Start
2015-01-01
Budget End
2015-12-31
Support Year
2
Fiscal Year
2015
Total Cost
$749,352
Indirect Cost
Name
Dnastar, Inc.
Department
Type
DUNS #
130194947
City
Madison
State
WI
Country
United States
Zip Code
53705