The growing importance of artificial intelligence (AI) is visible by the growth in companies and increasing deals over the past year between pharma and smaller companies using machine learning to assist in drug discovery. The continuing steady growth of structure-activity data for diverse targets, diseases and molecular properties poses a considerable challenge as they are generally not readily accessible for machine learning: content resides in a mixture of public databases (with differing levels of curation), disparate files within research groups, non- curated literature publications. In Phase I, Collaborations Pharmaceuticals Inc. developed a prototype of Assay Central software and used this with a wide variety of structure activity data from sources both public and private, formatted and unformatted, for enabling neglected, rare or common disease targets. Public data was mixed with collaborator/customer-contributed data, using original software and applied chemistry judgment of an expert team. In Phase I we created error checking and correction software. We also built and validated Bayesian models with the datasets that were collected and cleaned. And, in addition, we developed new data visualization tools. The software environment that we created readily enables the user to compile structure-activity data for building computational models and can be used to create selections of these models for sharing with collaborators as needed. This software can in turn be used for scoring new molecules and visualizing the multiple outputs in various formats. We have enabled ~14 collaborative projects which have shared models on specific targets such as PyrG for Tuberculosis (identifying a lead compound), HIV reverse transcriptase, whole cell screening for Leishmaniasis as well as P450 and nuclear receptor models (e.g. estrogen receptor) relevant to toxicology. We have utilized Assay Central in our ongoing internal projects working on Ebola, HIV and tuberculosis small molecule drug discovery. In Phase II, we propose the following aims that will enable us to develop Assay Central into a production tool for enabling drug discovery collaborations which we will continue to focus on. In Phase 1 we performed a preliminary analysis of different machine learning algorithms with select drug discovery datasets. In Phase II we will now perform a thorough evaluation and selection of additional machine learning algorithms and molecular descriptors as well as assessment of combination of algorithms (e.g. Bayesian and Deep Learning). We will implement disease/target definitions for machine learning models to facilitate drug discovery. We will enable molecule selection and automated design and optimization. The utility of having such a tool as Assay Central readily available will empower scientists to leverage public, private or a combination of data to help with their drug discovery tasks. Developing this software suite of computational models with public data will enable us to identify foundations, academics and potential collaborators that generate preliminary data to test models. These efforts will dramatically increase the number of projects we can work on, create new IP, and generate employment using machine learning focused on drug discovery in the area of rare and neglected diseases, in particular. Assay Central benefits include 1. Ease of deployment and use with a Java file executed by users without the need for IT support; 2. Built on industry standard technologies; 3. Graphical display of models provides instant feedback; 4 Model applicability with multiple methods to assess scores and graphics.
In Phase I Collaborations Pharmaceuticals Inc. have developed a prototype of ?Assay Central? software and used this with a wide variety of structure activity data from sources both public and private, formatted and unformatted, for enabling neglected, rare or common disease targets. The software environment that we created readily enables us to compile structure-activity data for building computational models, can be used to create selections of these models for sharing with collaborators and has been used successfully to find promising lead compounds for many collaborators. We now propose in Phase II to optimize the underlying machine learning algorithms and molecular descriptors (e.g. combining Bayesian and Deep Learning), linking potentially thousands of rare and neglected diseases to the machine learning models, and automating molecule selection and new molecule design to enable further high value drug discovery collaborations with Assay Central.