With support from the Chemical Structure, Dynamics and Mechanisms - B Program in the Division of Chemistry and in response to the Data-Driven Discovery Science in Chemistry (D3SC) Dear Colleague Letter, Professor Richard N. Zare at Stanford University is working on optimizing chemical reactions in microdroplets with deep reinforcement learning. Unoptimized reactions are expensive because they waste time and reagents. A common way for chemists to explore reaction optimization is to change one variable at a time while all other variables remain fixed. This method, however, might not find the best conditions, that is the global optimum. Another way is to search across all combinations of reaction conditions by using batch chemistry. This approach gives a better chance to find the global optimal condition, but it is time-consuming and expensive. Deep reinforcement learning is believed to be a superior approach in which the computer analyzes a large data set and recognizes the pattern of features that lead to best reaction outcomes. It is like training a dog: suppose we want the dog to pick up a ball. If the dog does what we want, we say "Good dog!"; if it does not, we say "Bad dog!". Similarly, Professor Zare uses a machine learning method to give the system a positive reward if the reaction reaches a better result than previous ones, or a negative reward if it does not. A repeated process will eventually result in a set of best reaction conditions for certain reactions. Professor Zare and his group apply this approach to microdroplet chemistry, where many reactions can be carried out in small droplets and be accelerated by factors of one thousand to one million compared with the same reaction happening in bulk solution. Combining the efficient deep reinforcement learning method with accelerated microdroplet reactions, Professor Zare and his group are seeking to find optimal reaction conditions in a fast way. This combined approach can represent a significant step for enabling artificial intelligence to be used to optimize chemical reactions, which should have benefits in chemical production, drug screening, and materials discovery. The students in the Zare group enjoy the unique opportunity to experience micro-droplet chemical synthesis, fast chemical characterization, and deep learning-based complex data analysis.
A reaction can be thought of as a system having multiple inputs (parameters) and providing one or more outputs. Example inputs include: temperature; solvent composition; pH; catalyst; droplet size; and time. Example outputs include: product yield; selectivity; purity; and cost. The goal of reaction optimization described here is to select the best inputs to achieve a given output, which can be formulated as a reinforcement learning system. In order to find the optimal reaction conditions, Professor Zare is searching for critical reaction condition to try at the next step based on previous reaction conditions and product yields. A recurrent neural network is used to model the policy for reaction optimization. The reinforcement learning system is trained on mock reactions (random functions) and then real reactions for better performance. The approach, if successful, could help better understanding of fundamental features of reactivity and enable important industrial applications.