This proposal focuses on the development and optimization of three complementary technologies that will improve our ability to efficiently characterize sequence variants uncovered in ongoing re-sequencing efforts of patients with both single gene and complex disorders. While the technologies are widely applicable to human disease studies and associated variants, our proof-of-principle analysis will focus on the analysis of putative regulatory variants located in non-coding regions of the genome. Specifically, we will analyze sequence variants that are associated with changes in expression of individual adjacent genes (cis-expression quantitative trait loci, cis-eQTL) and with lipid-related traits in families of the San Antonio Family Heart Study (SAFHS). It is clear that the identification and characterization of functional regulatory sequence variants affecting gene expression related to human diseases requires the development and application of novel analysis tools that allow the characterization of protein binding both in vitro and in vivo. We propose to use the resources of the SAFHS which include complete whole genome sequence (WGS) data, eQTL information and disease association, and cell lines available from all participants, to develop three independent complementary technologies for the functional characterization of regulatory variants influencing lipid variation: In vitro Technology: We will develop high-throughput dsDNA arrays for in vitro analysis of allele-specific protein-DNA interactions, and evaluate 1000 variants associated with plasma lipid traits and individual gene expression levels (cis-eQTL) with p<5x10-7 in the SAFHS cohort. Representative variants will be validated by EMSA, and binding proteins will be identified using mass spectrometry. In silico Technology: We will apply Bayesian analysis approaches incorporating empirically derived (i.e., cis- acting effect sizes, lipid-associated effect sizes, allele frequency, etc.) and bioinformaticaly derived (i.e., regulatory potential, nucleosome accessibility, etc.) features to develop computational prediction tools to statistically predict likely functional regulatory variants from WGS data. In vivo Technology: We will develop an analysis approach to validate allele-specific protein binding to regulatory variants directly in cells. We will use cell lines from the SAFHS cohort of known genotype for putative functional variants influencing gene expression to isolate target regions of cross linked chromatin using hybridization capture, and confirm allele-specific protein binding. In addition to the development of this novel in vitro, in silico, and in vivo technologies that can be utilized to characterize both common and rare variation in the human genome, our proposed work will identify and validate regulatory variants associated with lipid traits in humans. This will provide a unique resource for GWAS and other studies dissecting the genetic basis of human diseases.
Recent large-scale genetic studies have uncovered a large number of sequence variants that affect common diseases such as obesity and cardiovascular disease by altering the expression of individual genes. Unfortunately, current technologies to functionally characterize these sequence variants are not sufficient to verify these findings, and help understand the cellular mechanisms contributing to these disorders. We propose to develop three independent complementary technologies to accelerate and improve the in vitro high- throughput screening, in silico prediction, and in vivo validation of the effect of sequence variants on protein- DNA interactions and gene expression.