Protein-ligand binding affinity is the principal determinant of many vital processes, such as cellular signaling, gene regulation, metabolism, and immunity, that depend upon proteins binding to some substrate molecule. Consequently, it has a central role in drug design. Due to prohibitive costs and delays associated with experimental drug discovery, academia and pharmaceutical and biotechnology companies rely on virtual screening using computational molecular docking. Typically, this involves docking of tens of thousands to millions of ligand candidates into a target protein receptor?s binding site and using a suitable scoring function to evaluate the binding affinity of each candidate to identify the top candidates as leads or promising protein inhibitors. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein?s binding site and its computational complexity have a significant bearing on the accuracy and throughput of virtual screening. However, current state-of-the-art scoring functions have a number of deficiencies, including either mediocre accuracy for affinity prediction or low throughput, inconsistent accuracy, inflexibility in accuracy-throughput trade-off provided, and reliance on only a single category of scoring function.
INTELLECTUAL MERIT: Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes remains one of the most challenging problems in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. We seek to address this problem by developing efficient discrete optimization algorithms that facilitate: (1) the design of accurate, high-throughput single and multi SF methods with provable optimality for a given protein-ligand complex dataset; (2) determination of biochemically-relevant SFs through novel biochemical rule filters that suitably constrain the protein-ligand complex features selected; (3) prediction robustness through a novel multi-SF approach that reduces the variance in accuracy associated with relying on only a single SF; and (4) flexibility in accuracy-throughput tradeoff provided through a new integrated dynamic multi-SF approach.
BROADER IMPACTS: This project will have a number of broader impacts: (1) public health benefits by facilitating efficient and cost-effective drug discovery, which in turn helps lower drug costs and improves affordability; (2) impact on other domains where scoring function type approaches are used; (3) interdisciplinary training of students in an important application area; (4) dissemination of research and software artifacts developed during the project; and (5) participation and training of underrepresented groups and K-12 outreach.