This project leverages artificial intelligence to accelerate the discovery of factors governing protein synthesis by the ribosome, a complex molecular machine. Large amounts of data can be rapidly generated concerning the molecular biology of the cell. Yet making sense of massive amounts of data to gain understanding is a significant challenge. Specifically, during the process of protein synthesis by the ribosome there are so many molecular factors interacting that determining which of those regulate the speed at which it functions is difficult. In this project, artificial intelligence is used to identify putative causal features, which then become the starting point for the development of physics and chemistry based models that can explain the physical relationship between those variables and the rate at which the ribosome functions. Because this approach is general, it will be transferable between topics, thereby accelerating the process of going from data to insight across a range of problems. This research will make it possible to predict the influence of amino acid mutations on protein synthesis. Bioengineering and biopharmaceutical communities can exploit this information for optimization of protein expression. Finally, this proposal will promote diversity in the sciences by teaching high-school students from underrepresented groups topics in machine learning and interest them in STEM fields.
Chemistry- and physics-based models of biomolecular processes are critical tools used throughout the biochemistry and molecular biology communities to explain the relationship between molecular behaviors and experimental data. A bottleneck in the development of such models is the identification of the essential features driving the biomolecular process of interest. Machine learning models, which often make accurate predictions with no explanatory power, offer the potential to rapidly identify these essential features. This project will create a workflow that will leverage this beneficial feature of machine learning to guide biophysical model development, and thereby accelerate the process of going from data to insight. The PI’s lab recently demonstrated that the identity of the transfer RNAs and amino acids in the A- and P-sites of the ribosome predictably and causally modulate the translation elongation speed at the A-site. This project will apply the machine learning workflow to model and understand the molecular origins of this effect, which are currently unknown. First, an ensemble machine learning approach will be utilized that identifies the robust physicochemical features of amino-acid and tRNA molecules at the E-, P- and A-sites that accurately predict translation speed. Next, these robust and predictive physicochemical properties will be used as a starting point to construct physical models that explain why those properties are important. Finally, the predictions and insights from the models will be experimentally tested in vivo by a collaborator.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.