This proposal focuses on the development and optimization of three complementary technologies that will improve our ability to efficiently characterize sequence variants uncovered in ongoing re-sequencing efforts of patients with both single gene and complex disorders. While the technologies are widely applicable to human disease studies and associated variants, our proof-of-principle analysis will focus on the analysis of putative regulatory variants located in non-coding regions of the genome. Specifically, we will analyze sequence variants that are associated with changes in expression of individual adjacent genes (cis-expression quantitative trait loci, cis-eQTL) and with lipid-related traits in families of the San Antonio Family Heart Study (SAFHS). It is clear that the identification and characterization of functional regulatory sequence variants affecting gene expression related to human diseases requires the development and application of novel analysis tools that allow the characterization of protein binding both in vitro and in vivo. We propose to use the resources of the SAFHS which include complete whole genome sequence (WGS) data, eQTL information and disease association, and cell lines available from all participants, to develop three independent complementary technologies for the functional characterization of regulatory variants influencing lipid variation: In vitro Technology: We will develop high-throughput dsDNA arrays for in vitro analysis of allele-specific protein-DNA interactions, and evaluate 1000 variants associated with plasma lipid traits and individual gene expression levels (cis-eQTL) with p<5x10-7 in the SAFHS cohort. Representative variants will be validated by EMSA, and binding proteins will be identified using mass spectrometry. In silico Technology: We will apply Bayesian analysis approaches incorporating empirically derived (i.e., cis- acting effect sizes, lipid-associated effect sizes, allele frequency, etc.) and bioinformaticaly derived (i.e., regulatory potential, nucleosome accessibility, etc.) features to develop computational prediction tools to statistically predict likely functional regulatory variants from WGS data. In vivo Technology: We will develop an analysis approach to validate allele-specific protein binding to regulatory variants directly in cells. We will use cell lines from the SAFHS cohort of known genotype for putative functional variants influencing gene expression to isolate target regions of cross linked chromatin using hybridization capture, and confirm allele-specific protein binding. In addition to the development of this novel in vitro, in silico, and in vivo technologies that can be utilized to characterize both common and rare variation in the human genome, our proposed work will identify and validate regulatory variants associated with lipid traits in humans. This will provide a unique resource for GWAS and other studies dissecting the genetic basis of human diseases.

Public Health Relevance

Recent large-scale genetic studies have uncovered a large number of sequence variants that affect common diseases such as obesity and cardiovascular disease by altering the expression of individual genes. Unfortunately, current technologies to functionally characterize these sequence variants are not sufficient to verify these findings, and help understand the cellular mechanisms contributing to these disorders. We propose to develop three independent complementary technologies to accelerate and improve the in vitro high- throughput screening, in silico prediction, and in vivo validation of the effect of sequence variants on protein- DNA interactions and gene expression.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM109099-03
Application #
9096166
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Krasnewich, Donna M
Project Start
2014-07-24
Project End
2018-06-30
Budget Start
2016-07-01
Budget End
2017-06-30
Support Year
3
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Texas Biomedical Research Institute
Department
Type
DUNS #
007936834
City
San Antonio
State
TX
Country
United States
Zip Code
78245
Guillen-Ahlers, Hector; Rao, Prahlad K; Perumalla, Danu S et al. (2018) Adaptation of Hybridization Capture of Chromatin-associated Proteins for Proteomics to Mammalian Cells. J Vis Exp :
Proffitt, J Michael; Glenn, Jeremy; Cesnik, Anthony J et al. (2017) Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys. BMC Genomics 18:877
Holden, Matthew T; Carter, Matthew C D; Ting, Shannon K et al. (2017) Parallel DNA Synthesis on Poly(ethylene terephthalate). Chembiochem 18:1914-1916
Guillen-Ahlers, Hector; Rao, Prahlad K; Levenstein, Mark E et al. (2016) HyCCAPP as a tool to characterize promoter DNA-protein interactions in Saccharomyces cerevisiae. Genomics 107:267-73
Holden, Matthew T; Carter, Matthew C D; Wu, Cheng-Hsien et al. (2015) Photolithographic Synthesis of High-Density DNA and RNA Arrays on Flexible, Transparent, and Easily Subdivided Plastic Substrates. Anal Chem 87:11420-8