The goal of this work is to provide interactive computer visualizations which research scientists can use to interpret high throughput screening data and to make combinatorial chemistry choices. The simplest drug discovery principle is that compounds similar in enough properties are usually similar in biological activity. Similarity often involves measures in high-dimensional spaces, such as molecular fingerprints or shape descriptors. Uses of similarity in drug discovery research may apply to millions of compounds from virtual libraries of potentially synthesizable compounds. To examine relationships among vast numbers of compounds in diversity space, by simple graphical interactions with two dimensional maps of the space, allows the intuition of experienced scientists to come into play. The algorithms for visualization of thousand dimensional diversity spaces rely on horizons, which are distances beyond which the distance matrix need not be resolved, and on efficient subsampling methods. These concepts also enable selection of optimal descriptors to cluster compounds for predictive use, when combined in genetic algorithms. Optimal descriptors help not only in visualizing important features of diversity space, but in deciding which compounds to make and test next during early analoging of active substances.
Software that performs diversity selections needs these visualization tools. Compound libraries offered for random screening or following up on hits are more valuable when their designs can be illustrated. The tools apply to new areas such as differential gene expression data analysis. New methods for analyzing HTS data have commercial potential of improving the process of early drug discovery research.