COVID-19, the disease caused by the SARS-CoV-2 coronavirus, is at the center of one of the most dangerous pandemics the world has ever known. As it spreads through the human population the virus mutates producing proteins that can lead to higher infection rates (infectivity), and an increased ability to cause severe disease (virulence). This project will predict the most likely mutations of the virus by combining methods from machine learning, mathematics and biophysics. Specifically, the proteins resulting from viral mutations will be experimentally synthesized, and their infectivity and virulence will be tested by the project team through a collaboration with researchers in industry. This project benefits from unprecedented access to genomic data compiled on SARS-CoV-2, combined with a rich set of novel tools developed through interdisciplinary advances in data science, mathematics, and biophysics. The results of this project will build a pipeline capable of assisting the development of vaccines and drugs against COVID-19 while simultaneously advancing the fields of machine learning and mathematical virology. The project team is led by mathematicians, molecular biologists and biotechnology experts working in an interdisciplinary and collaborative setting. Students and postdoctoral researchers will be trained and will participate in publicly disseminating the findings and results of the project.
The SARS-CoV-2 coronavirus is believed to have originated as a bat virus and to have evolved through a combination of sequence mutations, recombination, and natural selection to be infectious in human hosts. Some of the most relevant sequence variations occurred in the S gene encoding the Spike (S) protein. As SARS-CoV-2 spreads through the human population, mutations of the S gene can potentially increase viral infectivity and virulence. Within the framework of an evolutionary algorithm, the PIs will combine graph theory, topological data analysis, and computational biophysics to characterize the most likely mutations of the S protein. This powerful interdisciplinary approach will draw upon existing experimental data from SARS-CoV-2. The PIs will collaborate with an industrial partner to experimentally design the peptides corresponding to those predicted sequences, and use binding affinity assays and cryo-electron microscopy to test binding of the peptides to the human receptor (ACE2). The resulting pipeline will help us better understand the evolutionary landscape of viral proteins and will assist researchers in the development of anti-viral drugs and vaccines. Future extensions of this work will increase our understanding of how viruses are transmitted across species and propagate in humans. The project will provide multi-disciplinary student and postdoctoral training. The PIs will broadly disseminate their results, as well as the data they collect and software they design.
With this award, the Mathematical Biology Program in the Division of Mathematical Sciences and the Chemistry of Life Processes Program in the Division of Chemistry are supporting Drs. Arsuaga, Rodriguez, and Vazquez from University of California-Davis to study genomic variations of the SARS-CoV-2 viral spike (S) protein and predict the expansion range of transmission in human populations.
This grant is being awarded using funds made available by the Coronavirus Aid, Relief, and Economic Security (CARES) Act supplemental funds allocated to MPS.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.