Decades of scientific enquiry beyond molecular biology have demonstrated just how fundamental form is to function, whether in understanding phase transitions in statistical physics, predicting the evolution and dynamics of real networks in network science, or successfully steering an articulated robot arm to a target pose. A fundamental question in all these scientific domains is how to effectively explore the space of all possible forms of a dynamic system to uncover those that satisfy non-trivial constraints imposed by function. The most visible instantiation of this question in computational structural biology is de-novo protein structure prediction (PSP). PSP takes a structure-driven view of understanding molecular mechanisms in the cell and seeks to determine one or more biologically-active/native structures of a protein from knowledge of its chemical composition. Elucidating such structures is central to inferring the biological activities of a rapidly-growing number of protein-encoding gene sequences and thus advancing our understanding of the inner workings of a cell. While PSP has a natural formulation under stochastic optimization, current efforts are approaching a saturation point. This project proposes a radically-different, complementary approach. Inspired by recent momentum in generative deep learning, the project approaches de-novo PSP under the umbrella of generative, adversarial deep learning. The approach is firmly grounded in information integration and informatics, as it proposes generative models that learn in an adversarial setting to generate native-like tertiary protein structures. The project benefits researchers in machine learning, deep learning, and information integration with interests in graph generative models, molecule generation, and protein structure prediction. The project will result in open-source codes, online teaching modules and tutorials, publicly-available data and models, workshops, software demos, and will broaden the participation in computing of under-represented students.

The activities in this project chart a new algorithmic path under the umbrella of information integration and informatics to address the current impasse in structure-function related problems in molecular biology. The focus is on the de-novo protein structure prediction problem. With experimental structure determination lagging behind the rapidly-growing number of protein-encoding gene sequences by high-throughput sequencing technologies, computational approaches have a central role in molecular biology research. Great progress has been made through stochastic optimization, but current approaches are experiencing diminishing returns, partly due to fundamental challenges concerning the resource-aware exploration-exploitation control in complex search spaces and inherently inaccurate scoring functions. This project puts forth a novel approach to structure prediction under the umbrella of generative, adversarial deep learning, leveraging recent advances and opportunities in graph generative learning, adversarial learning, and deep learning. Generative models learn in an adversarial setting to generate native-like tertiary protein structures. The proposed activities span multiple disciplines and promise to make general contributions in machine learning, deep learning, explainable AI, molecular modeling, and computational biology. The work will also benefit researchers and students interested in modeling complex, dynamic systems. The investigators will disseminate the proposed research via open-source codes in C++ and Python so as to reach diverse communities of researchers and students, online teaching modules and tutorials, trained models and data. They will actively educate involved communities through workshops, tutorials, and software demonstrations. This interdisciplinary project also creates excellent opportunities to broaden the participation in computing of under-represented students of all backgrounds.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
2110926
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2020-10-01
Budget End
2022-07-31
Support Year
Fiscal Year
2021
Total Cost
$385,187
Indirect Cost
Name
Emory University
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30322