This RAPID award will develop new computational tools for SARS-CoV-2 research that bring to bear on COVID-19 a massive amount of evolutionary data on variations and (often neglected) divergences through algorithms that embody basic concepts of evolution, physics, and machine learning. The project will develop, benchmark and disseminate tools that perform deep mutational scanning of every CoV2 protein entirely in silico. A computational analysis of sequence, structure and function across the Coronavirus family will yield the discovery of functional epitopes across the CoV2 proteome and on weighing the functional impact of new mutations in emerging CoV2 strains. The outcome will be to comprehensively map all actionable targets in CoV2 that may then serve, for example, as antigens for pan-Coronavirus vaccines or as docking sites for repurposed drugs. A machine learning search for human host genetic factors that may distinguish between asymptomatic to mild COVID-19 infections from severe to lethal infections will help identify and disseminate human host biomarkers of mortality that personalize preventive measures and treatment modalities, tailored to individual genetic risks.

The project applies an Evolutionary Action theory that takes an integrative mathematical physics approach to the couplings between sequence variations, protein structure-function and evolutionary divergences. This approach involves computing the evolutionary forces that shaped fitness landscapes, and assessing the energy expended by mutations against these forces as they traverse these landscapes. While most natural mutations exert low energy and little or no impact on function or evolution, a few mutations carry significant energy and reliably change function and fitness. Preliminary data, including on SARS-CoV-2, show that this approach will yield precise and tunable maps of protein functional epitopes and accurate scores of the impact of mutations on function. As a training feature, it will provide a singularly accurate measure of the functional impact of each mutation for machine learning to find which genes are diferentially affected in a group of carriers of a complex phenotype vs a control group. Because EA is based on fundamental principles of evolution and protein structure-function, it is entirely general and the computational techniques we propose can be deployed to study any virus, bacterium, or eukaryotic system and their interactions. This will lead to broad new insights into the genotype-phenotype relationship, equally useful to research, education, and biotechnology development across all kingdoms of life. This RAPID award is made by the Division of Biological Infrastructure using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
2032904
Program Officer
Jean Gao
Project Start
Project End
Budget Start
2020-06-01
Budget End
2021-05-31
Support Year
Fiscal Year
2020
Total Cost
$200,000
Indirect Cost
Name
Baylor College of Medicine
Department
Type
DUNS #
City
Houston
State
TX
Country
United States
Zip Code
77030