Cheminformatics can be conceptualized as attempting to learn predictable relationships between measured performance of small molecules and their chemical structures. In this context, chemical structure typically is represented by a numerical description of molecular constitution, connectivity, size, shape, and reactivity of small molecules. However, the most widely used molecular descriptions eschew three-dimensional considerations such as stereochemistry and molecular shape in favor of speed of calculation, and almost no such descriptions explicitly consider the strategies underlying small-molecule syntheses. This is particularly problematic for our CMLD, since stereochemistry and small-molecule synthesis lie at the heart of our research. Moreover, most existing cheminformatic approaches consider multiple structural features of small molecules in the context of a single performance measurement (e.g., enzyme inhibition). Increasingly, small-molecule screening exploits the ability to make multiple measurements per well (e.g., gene expression- or image-based screening) or in parallel. Such datasets provide a unique opportunity for Cheminformatics research - the relevance of quantitative descriptions of chemical structure, and their connections with synthetic planning, can be judged using actual performance of small molecules in multiple biological assays. Project 4 aims to exploit this opportunity. Direct modulations with small molecules of cellular pathways that underlie phenotypic change make it possible to dissect cell circuitry and even disease biology. To enable discovery and optimization, both chemical biology and drug discovery rely on the chemical similarity principle - small molecules with similar structures have similar performance. Multidimensional datasets reflecting actual performance provide a computational basis to find structural descriptions whose similarities best accord with similarities in small-molecule performance. In turn, these descriptions can be used to guide synthetic chemistry planning by calculation of relevant structural properties for new small molecules in advance of their synthesis. Such descriptions of structure could provide guidance to synthetic chemists seeking to make a small-molecule screening collection with optimal properties. Particularly in the pilot synthesis phase, where relatively small numbers of compounds are made available for biological testing, it is important to plan syntheses such that maximal information about structure/activity relationships can be learned from pilot screening experiments. In the context of the build/couple/pair (B/C/P) strategy for diversity synthesis that is being explored in our CMLD (Projects 1 and 2, supported by Project 3), direct computational connections between synthetic choices and descriptors of molecular complexity, stereochemistry, and shape will help guide synthetic planning. Coupling biological performance annotation to subsets of molecules realized as pilot libraries will provide guidance (e.g., to the Library Synthesis Core) as to which candidate small-molecule collections will have the highest likelihood of biological activities and diversity of biological activities.
Showing the most recent 10 out of 71 publications