SARS-CoV-2 is now a global pandemic with 4.2M cases and 290K deaths worldwide (as of May 12, 2020). In the United States, there are over 1.3M cases and 81K deaths. Locally, Arizona has over 11K cases and 562 deaths. In response to this public health emergency, several studies have been published that describe patient characteristics in terms of signs, symptoms, and clinical endpoints. In addition, epidemiologists and infectious disease researchers have utilized next-generation sequencing technology to produce complete genomes of the virus for clinical and epidemiologic investigation. Genomic epidemiology has enabled scientists to understanding localized transmission while determining geographic sources of introductions from different states and countries. However, most of the sequencing for SARS-CoV-2 (as well as for other viruses) is performed outside of state or local health departments such as the Centers for Disease Control and Prevention (CDC), universities, or private labs. It can then be difficult to link the pathogen, once sequenced, back to the data collected by the health department for case investigation. This can inhibit genomic epidemiology when there is no link between sequences of viral isolates and epidemiologic case data. There is limited research in how to link pathogen sequences to epidemiologic case data; especially for COVID-19. Thus, despite the abundance of clinical and epidemiologic data collected during this pandemic, more informatics research is needed to understand how to link viral genetic and epidemiological data and demonstrate the value of this for disease surveillance. The goal of this supplement is to link epidemiologic data from COVID-19 positive patients in Arizona with viral genetics from sequenced isolates to better understand the relationship between viral genetics and epidemiologic and clinical phenotypes. We will accomplish this by utilizing Arizona?s disease surveillance system and available sequences and metadata that are published in online nucleic acid databases. We will use different probabilistic matching strategies to link the two different sources (Aim 1) and then use Bayesian phylogenetics and phylogeography to study clustering of epidemiologic cases (Aim 2). Epidemiologists can use these findings to gain an understanding of how local viruses genetically cluster in relation to specific epidemiologic and clinical cases. While disease severity is dependent on individual immune response and environmental factors, linking viral genetics to its proper epidemiologic case could also support hypothesis generation for future reverse genetics and immunological studies in animal models.

Public Health Relevance

This biomedical informatics project will leverage probabilistic matching to link reportable disease data and viral sequence data of SARS-CoV-2. This will support the analysis of local SARS-CoV-2 cases by linking them with the genetics of the virus for on-going public health surveillance.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
3R01LM013129-02S1
Application #
10166255
Study Section
Program Officer
Sim, Hua-Chuan
Project Start
2020-07-01
Project End
2021-06-30
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Arizona State University-Tempe Campus
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
943360412
City
Tempe
State
AZ
Country
United States
Zip Code
85287