This proposal is submitted in response to PA 14-156 ?Extended Development, Hardening, and Dissemination of Technologies in Biomedical Computing, Informatics, and Big Data Science? which aims to support continued development of software and databases for informatics science. Here, we outline the development of our resource, Binding MOAD (Mother of All Databases, pronounced ``mode'' as a pun on binding modes for ligands). MOAD is one of the largest collections of high-quality, protein-ligand complexes available from the scientific literature and the Protein Data Bank (PDB). PA 14-156 notes that projects should be of interests to most NIH Institutes and Centers, so it is an important point that all protein-ligand complexes from all organisms in the PDB are included, making MOAD applicable to all areas of human health and biomedical research. The complexes in MOAD are curated to correct errors, to differentiate biologically relevant ligands from cofactors and crystallographic additives, and to annotate complexes with binding affinity data when available. Curated data is essential for rigorous, reproducible science. Furthermore, MOAD's HiQ subset is the gold standard for docking calculations, and it sets a solid foundation for method development. MOAD is a rich dataset with significant impact on the scientific community. The database and website (www.BindingMOAD.org) have been cited hundreds of times in the scientific literature. The website receives ~25,000 visits each year. MOAD's rate of 510 hits/wk is less than the traffic at BindingDB or ZINC, but more than the traffic to Shoichet's SEA, DOCK Blaster, or DUD enhanced (DUDE) utilities. The resource and on-line tools are used by a wide range of scientific disciplines: bioinformatics, structural biology, biophysics, protein science, medicinal chemistry, theoretical chemistry, and computer science. Scientists use MOAD to examine patterns of molecular recognition, elucidate enzyme mechanisms, develop methods for structure-based studies, predict toxicology, and develop new protein-folding routines that incorporate cofactors and ligands. Our long-term goal is to provide tools for computational biology that meet users' diverse scientific needs, help uncover new relationships, and inspire new hypotheses from large datasets. Guided by structural biology and cheminformatics we can filter the PDB's Big Data into intuitive patterns of ligand and receptor similarity. Beyond the intrinsic value of the data itself, the novel impact of this proposal is the linking of chemical and biological data in novel ways to reveal potential polypharmacology networks. Our hypothesis is that similar ligands are likely to bind to the same binding sites, and conversely, similar binding sites are likely to bind the same small molecules. To make the links between similar ligands and pockets more accessible to the user, we propose using ?chemical similarity trees? to display new ligand-target pairings with potential biological significance. We will also create polypharmacology wiki pages for MOAD. Potential ligand-target pairings will be available to our user base, and we will facilitate crowd-sourcing their diverse biomedical expertise.
The project provides unique data, tools, and resources that are needed to improve the field of structure-based drug design. These designed improvements make better drug-discovery methods possible. That will save time and money in the development of new treatments, leading to less expensive new drugs for the greater population.