This EAGER award supports research and education involving a new collaboration kindled at the MATDAT18 Datathon event focused on advancing understanding of how water interacts with proteins and complex molecular assemblies. Oil and water don't mix. Examples of this common wisdom are prevalent in everyday life from the sheen formed on a rain puddle by spilled gasoline, to the separation of oil and vinegar in a bottle of salad dressing. These are large-scale examples of a physical principle known as hydrophobicity - a word derived from Ancient Greek that characterizes the "horror for water" experienced by particular molecules. This physical principle is also active at microscopic scales, with hydrophobicity playing an important role in controlling the structure and function of molecules in water. Of particular interest is the behavior of proteins: a class of molecules that use the hydrophobic effect to perform functions critical to life, serving as - among many other things - enzymes to help break down food, hormones to regulate physiology, and antibodies to protect against infection. Some physical forces can be described by simple and elegant equations, such as Newton's law of gravitation or Coulomb's law of electrostatics, but decades of work have shown that no such simple descriptions seem to exist for hydrophobicity. Instead, the hydrophobic interaction is a very complicated force that depends sensitively on the details of the hydrophobic molecule and its interactions with the water molecules around it. Unraveling the details of this interaction in the context of proteins is important in helping understand the fundamentals of this ubiquitous and important force, and in helping discover and design proteins to serve as new drugs or novel molecular machines.

How might one probe and understand the complexities of hydrophobicity? Artificial intelligence techniques are now ubiquitous in modern life, serving as recommendation engines for online shopping, automatically recognizing faces in camera phones, and enabling autonomous and assisted driving. Conventional computer programs work by executing a pre-programmed set of rules to achieve an outcome; modern artificial intelligence techniques are instead provided with a set of examples and automatically learn the rules from the data. This research project will use artificial intelligence techniques to learn the rules of hydrophobicity from computer simulations of water around proteins. Specifically, using sophisticated molecular simulations to model the interactions and dynamics of water molecules, and special tools to measure hydrophobicity, databases of the "horror for water" of different regions of the protein surface will be compiled. Artificial intelligence tools will then analyze these databases to find a mathematical model between hydrophobicity and the chemical composition and shape of the protein surface. The models learned in this way will help untangle the complexities of hydrophobicity, and can be used to quickly predict how proteins in water will behave. The tools will also be adapted and analyzed to provide human-interpretable explanations that help provide new understanding of hydrophobicity rather than just furnishing a complicated mathematical model.

These research activities will provide new models and understanding of protein hydrophobicity that can be used to search for new drug molecules and engineer proteins with new structures and functions. The simulation and artificial intelligence tools will be made broadly available to the scientific community and general public through open source molecular simulation packages, free software libraries, and through online code sharing sites. Undergraduate students will be involved in the research projects through 10-week paid summer internships to be offered in each year of the award. These research experiences will be designed according to best practices in providing authentic and valuable training experiences, and special efforts will be made to recruit students from groups traditionally underrepresented in science, technology, engineering, and math fields.

Technical Abstract

This EAGER award supports research and education involving a new collaboration kindled at the MATDAT18 Datathon event focused on integrating sophisticated molecular simulation tools with machine learning techniques to understand hydrophobicity at the nanoscale. The hydrophobic effect - the tendency for non-polar moieties to cluster together and exclude water molecules in aqueous solvent - plays an important role in the interactions and assemblies of complex molecules, such as cavitands, dendrimers, and proteins. However, quantifying the hydrophobicity of such molecules, which display complex chemical and topographical patterns at the nanoscale, has proven to be an enduring and open challenge. Recent work has illuminated the failure of additive approaches that attempt to break down molecular hydrophobicity as a sum of the hydrophobicities of the constituent surface groups, and demonstrated that hydrophobicity at the nanoscale is a complex, collective, many-body response of hydration waters to chemical and topographical surface cues. This complexity not only frustrates a fundamental molecular understanding of hydrophobicity at the nanoscale, but also has important practical consequences, such as the inability to accurately screen ligands for drug discovery.

The overall goal of this research project is to conduct enhanced sampling molecular simulations to accurately quantify the hydrophobicity of an extensive library of nanostructured surfaces through the free energy of cavity formation, and to deploy supervised machine learning techniques to unveil new understanding of the physical, chemical, and topographical cues governing surface hydrophobicity. The central hypothesis of this project is that the application of data-centric tools can provide new understanding of the molecular determinants of hydrophobicity beyond what is possible with simple conceptual models and human intuition. The first objective of this work is to quantify the hydrophobicity of an extensive library of patterned self-assembled monolayer surfaces and proteins by estimating the free energy of interfacial cavity formation using enhanced sampling techniques. The second objective is to conduct supervised learning over the hydrophobicity libraries to construct quantitative structure property relationship models relating chemical composition and physical structure to the free energy of interfacial cavity formation. A number of machine learning techniques will be explored, including support vector machines, random forests, partial least squares regression, and artificial neural networks. The techniques will be adapted to be physics-aware by "baking in" the physics of hydrophobicity into the model, and to be explainable by furnishing human-interpretable understanding of their behaviors.

Successful completion of this research will have impacts in both materials and data science. From a materials science perspective, unveiling the molecular determinants of protein hydrophobicity - the relation between the topographical and chemical patterns displayed by the protein and the free energy of cavity formation - will shed new light on the driving force behind protein interactions and assembly, furnish precepts for rational engineering of protein structure and function, and open up applications in virtual high-throughput screening for the computational discovery of drugs, ligands, bioseparation agents, and co-solutes to modulate protein solubility. From a data science perspective, this work will establish new physics/chemistry-aware machine learning tools whose behaviors are more interpretable and comprehensible in the analysis of molecular behaviors than generic off-the-shelf techniques.

The Division of Materials Research and the Chemistry Division contribute funds to this award.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Materials Research (DMR)
Type
Standard Grant (Standard)
Application #
1844514
Program Officer
Daryl Hess
Project Start
Project End
Budget Start
2019-01-01
Budget End
2021-12-31
Support Year
Fiscal Year
2018
Total Cost
$231,647
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104