The SAVI Project is based on: (a) a set of transforms with rich chemical context annotation including functional group reactivity data (LHASA, LLC, U.S.; and Lhasa Limited, UK) (b) a set of highly annotated building blocks (Sigma-Aldrich, Global Strategic Services) (c) the chemoinformatics toolkit CACTVS with custom development (Xemistry GmbH, Germany) The transforms are a set of more than 1,500 rules described in the CHMTRN/PATRAN language for encoding chemical transformations with chemical context and quality criteria added, based ultimately on the pioneering work of E. J. Corey. These rules, in contrast to simple SMIRKS transforms, allow/provide: - Computation of whether a reaction, depending on the overall structural features of the target, will work at all. - Scoring: If the reaction works, how robust it is, taking into account overall structural features. - Whether protection of interfering groups is required - and these can then already be integrated in the final starting materials queries to prioritize pre-protected starting materials. - Proposal of suitable context-dependent reaction conditions. - Textual warnings in specific circumstances, such as potential of multiple products, borderline conditions, etc. Ancillary information to the rules is a set of functional group reactivity data, i.e. a table describing whether any of the standard functional groups in the rule set is unstable under any of the standard conditions. The building blocks are a set of several hundred thousand compounds available in gram quantities, and with high reliability, from, or through, Sigma-Aldrich. This set has been annotated with pricing information and other business intelligence type data useful for this project. The chemoinformatics toolkit CACTVS has been expanded in various ways, e.g. with the capability to read the CHMTRN/PATRAN transforms. An important feature that needed to be implemented was the handling of the reversal of the original LHASA transform direction, without re-writing rules, for the strictly forward-synthetic SAVI project. Another important capability was the initial and final starting material (SM) query handling, i.e. the 4-steps: initial SM query extraction from the 2D patterns in the rules; forward reaction from the 2D patterns; scoring (which is the only original LHASA functionality); final SM query expansion (R-groups, protecting groups, etc.). For the goal of filtering out structures with less-than-desirable attributes in the drug development context, several additional computed properties regarded as important in current drug design have been implemented, such as the demerit scores based on 275 rules for identifying potentially reactive or promiscuous compounds, published by Bruns and Watson (J. Med. Chem. 2012, 55, 9763?9772); In the current, very early alpha, stage of this project, only 11 transforms of the possible 1,500 were used; applied to approx. 230,000 building blocks; in only one-step reactions. The 610,000 resulting products have been annotated but not yet filtered with any of the computed or associated molecular properties. To limit the file size, only on the order of one percent of the theoretically possible products (of one-step reactions) was sampled. We have addressed the task of generating schematic graphical representations of the transforms. We are ultimately aiming at creating a database of one billion high-quality screening samples that should be easily and cheaply synthesizable. Our first full production run, using 14 transforms and about 377,000 building blocks, has resulted in more than 236 million products. These novel molecules are all annotated with a proposed simple and high-yield synthetic route, as well as by 50 molecular properties generally recognized as important in cutting-edge drug design that we have implemented. A web GUI is planned that will allow users free access to this database via searches by various criteria including substructure searches. It will also present links to pages where users can place requests for having the molecule(s) synthesized by commercial entities. Additional novel transforms for chemistry heretofore not in the knowledgebase have been written, yielding a total of about 70 productive and drafted transforms. The current set of functional transforms is 58. After a change in the business model of Sigma-Aldrich, we decided to change the set of building blocks to Enamine, from whom we got about 151,000 possible structures. With these, we have calculated about 900 million SAVI products in late 2018; and are on the way to compute more than 2 billion as we speak.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
Application #
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Cancer Institute Division of Basic Sciences
Zip Code