Despite clear ties between many viruses and human diseases, the relationship between disease genetic susceptibility and viral infection remains largely unexplored. For most diseases, genetic factors only explain a small fraction of disease susceptibility, suggesting that synergism with environmental factors such as viral infection must play a key role in their etiology. A common way that viruses interact with their host is through manipulation of host gene expression via regulatory interactions between viral proteins and the host genome. Many disease-associated genetic variants are non-coding, and thus likely to impact gene regulation. Our hypothesis is that specific genetic variants associated with human diseases act by influencing the binding of viral transcription factors (TFs) to host DNA. We propose a computational system that will use available data to identify these variants, and produce testable hypotheses about the viral proteins whose binding they influence:
Aim 1. Construct a regulatory network linking viral proteins to host genomic binding locations. We will create a network of genomic binding locations, and physical interactions between viral and human proteins.
Aim 2. Determine the DNA binding preferences of human-hosted viral transcription factors. We will experimentally determine the DNA binding preferences of viral TFs using Protein Binding Microarrays. We will supplement these data with already established viral TF DNA binding motifs.
Aim 3. Create a system to identify disease-associated genetic variants influencing viral protein binding. We will create a computational system to identify the union of the disease-associated genetic variants with the regulatory network developed in Aim 1, and the motifs determined in Aim 2. This system will systematically search for variants that are likely to alter the binding of viral proteins by modifying TF binding sites. The freely available database and tools developed in this study will facilitate new genomic analyses investigating the impact of disease-associated genetic variants on virus-host interactions. The proposed research is innovative because it will be the first study to combine available relevant data for studying this mechanism in a single predictive framework. Its contribution will be significant because it will enable the investigation of the effect of disease-associated genetic variants on viral protein binding for any disease with a known or suspected viral component. It is now possible to systematically identify such interactions using existing data, thanks to massive increases in genome-wide association study (GWAS) and functional genomics data. This resource will be essential for analysis of both currently available and new GWAS data involving any disease with a known or suspected viral environmental component. Based on our expertise, we will focus on identifying predictions relevant to a specific disease (lupus) and specific viruses (Epstein-Barr virus and Herpes Simplex Virus 1). We expect that the mechanisms identified by this system will help reveal environment-genotype links for many human diseases, opening new avenues of exploration into the mechanisms of disease onset and progression.
The proposed research is relevant to public health because many human diseases, despite established epidemiological ties to viral infection, lack a mechanistic understanding of these associations. This work will facilitate new analyses investigating virus-human interactions, and the influence of disease-associated genetic variants on these interactions. The project is relevant to NIH's mission because it will deepen our understanding of how genetic variants and viruses synergize to influence the progression of many diseases.