Control of SARS-CoV-2 infection and related pathogenic processes requires an understanding of how host and viral genetics factors drive disease outcome. The study is designed to characterize the genomic epidemiology of severe acute respiratory syndrome coronavirus 2 (SARS coronavirus 2, or SARS-CoV-2) and define host genetic effects on the outcome of viral infection. We will whole-genome sequence 1,000 COVID-19 positive patients at Stanford Health Care as part of this research study. Stanford University Hospitals and Clinics has piloted a robust and reproducible next-generation sequencing assay for joint viral and host genome detection. An initial pilot of 319 nasal swab samples and 15 Buffy coats from ICU patients demonstrates we are able to extract, sequence and analyze low pass host genomes and RNAseq data on all nasal swabs. Of these swabs, we were able to obtain full viral genome sequences from 180. Genetic ancestry analysis of the 319 host genomes shows overrepresentation of Hispanic/Latinos, Pacific Islanders, and other at-risk populations, recapitulating the ethnic disparity we and others have seen among cases. In this project, we will sequence additional samples to bring our total to 1,000 COVID-19 positive samples including inpatient, outpatient, severe, and critically ill patients. We will scale this project over the next 12 months and follow infected, severe, critical, and recovered patients to characterize the multiomic profile of disease severity from mild to severe to critically ill. This will be accomplished through two Aims.
The first aim focuses on host genetic sequencing from NP swabs where we expect to recover sufficient genetic material to characterize the host genome to high imputation quality. We will also characterize host genetic ancestry and background polygenic risk score for a host of related traits. A set of 100 contemporaneous COVID-19 negative samples will also be sequenced as controls and comparison for background ancestry in the treatment population. In parallel we will collect DTC derived genetic data from a diverse population sample.
Our second aim focuses on the virus genome data obtained from NP swabs. We will fully characterize as much of the SARS-CoV-2 genome per sample as possible and detect co-infections in the swab sample. Our goal is to understand the limit of detection for the technology, impact on reproducing viral dynamics, and characterizing alternative splicing in the SARS-CoV-2 genome over the time course of infection. Completion of the Aims outlined here will pilot Next Generation Sequencing for surveillance of SARS-CoV-2 host and viral genetics in Northern California that can be replicated across the world. This is critical for preparedness of a second wave and detecting co-infections among those infected with SARS-CoV-2 and contributes significantly to the emergency tracking of the pandemic during the second half of 2020 and beyond.
Our goal is to understand the genetic underpinnings of host response to infection by SARS coronavirus 2 (SARS- CoV-2). Our pilot project to sequence nasal swab samples and buffy coats from ICU patients demonstrates we are able to extract, sequence and analyze low pass host genomes and RNAseq data on all nasal swabs. We will scale this project over the next 12 months to sequence and follow 1,000 infected, severe, critical, and recovered patients to characterize what (if any) genetic differences exist among the cases and controls in terms of host and pathogen genetics.