Machine learning has broad applicability in the fields of chemistry and biology. This research effort is focused on empirical derivation of functions that are useful in the context of predicting aspects of molecular interaction between proteins and ligands. The characteristics of this problem offer unique challenges when approached from the perspective of machine learning, key among them being that the configuration in which molecules interact is not generally known. In the case of small molecule protein interactions, where it is possible to represent molecules as 3D objects, this is manifested in terms of hidden variables in the relative conformation and alignment of protein and ligand. Most machine learning tasks do not embed hidden variables in this fashion, but the problem is not insurmountable. We have implemented a number of methods which demonstrate that the problem of hidden variables is tractable, both methodologically in model induction and scoring function optimization as well as from the perspective of computational complexity in search. In this work, we will develop novel methods and refine existing methods in 3 problem areas: 1) Developing scoring functions for small molecule protein interactions with a known protein structure (the docking problem); 2) Developing quantitative models of small molecule activity against proteins with no known structure (the 3D QSAR problem); and 3) Developing methods for search and optimization that improve both model and scoring function induction and high-throughput application to large libraries of small molecules. The goal is to address the problem of prediction in a quantifiable way, which will allow both practical improvements in applications of the methods, and will also provide insight into the mechanistic aspects of the underlying physical molecular interactions. All methods and data will be made widely available to both academic and industrial investigators.
Showing the most recent 10 out of 25 publications