The Chemical Abstract Services (CAS) recently recorded the 53 millionth unique chemical substance in the CAS registry with the 40 millionth being cataloged only 9 months prior. With this explosive growth in chemical substances, the question of what physical, chemical, and biological properties are possessed by these substances arises. The existing data for fundamental physical/chemical properties, such as dissociation energies, logP, enthalpies of formation, refractive indexes, boiling points, and melting points define Structure Property Relationships (SPR) and is accessible through SciFinder and Beilstein. Similarly, fundamental biological properties such as binding constants to enzymes are available through the ChEMBL and PubChem databases.
This research creates an innovative tool relying on volunteer computing for predicting and mapping SPRs three-dimensionally using a novel computational algorithm ?PROPMAP? for education and research. The novel architecture of PROPMAP utilizes a Monte Carlo random walk structure generator and Quantitative Structure Property Relationship (QSPR) models based on a graphics processing unit (GPU) accelerated Support Vector Regression algorithm. PROPMAP interactively maps user-selected physical, chemical, and biological properties onto any input chemical structure of interest. Graphical representation fosters understanding of SPRs in basic chemistry education and enables targeted property modifications in research. This novel approach utilizes volunteer GPU computing through the Berkeley Open Infrastructure for Network Computing (BOINC) to overcome the otherwise prohibitive computational expense of training and cross validating models for data sets in excess of one million substances. The impact of PROPMAP on the scientific community is broadened by making the tool freely available through a WWW interface for use as an educational and research tool while also directly training institutions on its utility through workshops and seminars.