While genome-wide association studies (GWAS) have identified over 3000 loci associated with common disease, the mechanism by which variation at these loci are pathogenic remains unclear. The ENCODE project provides a unique resource of extensive functional genomic data that can be used to close this knowledge gap. The overall objective of this application is to develop computational methods to integrate data from the ENCODE project with GWAS data to predict simultaneously the relevant tissue type and functionally important variants for a given disease. A secondary objective is to validate these approaches through the analysis of data on cancer and autoimmune disease. The central hypothesis is that loci identified through GWAS tag functional SNPs that cause disease by altering transcription factor binding sites, thereby dysregulating genes in the relevant tissue typ (e.g. pre-neoplastic tissue for cancer and immune cells for autoimmune disease). The rationale that underlies this research is that the ENCODE data represents a rich resource for understanding GWAS results and that the methods to be developed here will enable similar analysis on other diseases. The research team is well prepared to undertake the proposed research because of their combined expertise in the conduct and analysis of GWAS, functional and bioinformatics follow-up of GWAS hits, and machine learning approaches to understanding genome-scale data including transcription factor binding. The central hypothesis will be tested through these aims: 1) Determine if ENCODE data generated in the appropriate tissue type can be used to find putative functional transcriptional regulatory variants at disease-associated loci. This will be achieved by asking if lymphoma risk SNPs tend to alter transcription factor binding sites and associate with expression of nearby genes in lymphoblastoid cell lines. 2) Using ENCODE data, identify the cell types and tissue(s) important for a given disease. Relevant cell types will be identified by determining those cell types in which genes are more likely to be expressed near disease risk loci in the ENCODE data. 3) Identify putative functional SNPs in GWAS using ENCODE when complete functional genomic data is not available for the appropriate tissue type. To extend these analyses beyond the few cell types extensively studied in ENCODE, DNase hypersensitivity data from the relevant tissue will be linked with ChIP-Seq transcription factor binding data from other tissues to allow identification of variants that alter transcription factor binding. This research is significant because it will provide new insight into the biology of cancer and autoimmune disease. More importantly, it will provide the tools necessary to use the ENCODE data to link disease risk loci with functional variants and potential mechanistic explanations.

Public Health Relevance

Over the past five years, numerous genetic changes associated with common disease have been identified. The next step to use this information to improve human health is to determine how these changes alter cellular function. Here, methods to integrate data from the ENCODE project with these genetic studies are proposed to enable investigators to generate hypotheses regarding the function of these genetic changes.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01HG007033-02
Application #
8546275
Study Section
Special Emphasis Panel (ZHG1-HGR-M (M2))
Program Officer
Pazin, Michael J
Project Start
2012-09-17
Project End
2015-06-30
Budget Start
2013-07-01
Budget End
2014-06-30
Support Year
2
Fiscal Year
2013
Total Cost
$526,845
Indirect Cost
$169,453
Name
Sloan-Kettering Institute for Cancer Research
Department
Type
DUNS #
064931884
City
New York
State
NY
Country
United States
Zip Code
10065
Kilpeläinen, Tuomas O; Carli, Jayne F Martin; Skowronski, Alicja A et al. (2016) Genome-wide meta-analysis uncovers novel loci influencing circulating leptin levels. Nat Commun 7:10494
Yuan, Hua; Liu, Hongliang; Liu, Zhensheng et al. (2016) A Novel Genetic Variant in Long Non-coding RNA Gene NEXN-AS1 is Associated with Risk of Lung Cancer. Sci Rep 6:34234
Machiela, Mitchell J; Lan, Qing; Slager, Susan L et al. (2016) Genetically predicted longer telomere length is associated with increased risk of B-cell lymphoma subtypes. Hum Mol Genet 25:1663-76
Liu, Xiaoming; White, Simon; Peng, Bo et al. (2016) WGSA: an annotation pipeline for human genome sequencing studies. J Med Genet 53:111-2
Hakimi, A Ari; Ostrovnaya, Irina; Jacobsen, Anders et al. (2016) Validation and genomic interrogation of the MET variant rs11762213 as a predictor of adverse outcomes in clear cell renal cell carcinoma. Cancer 122:402-10
Berndt, Sonja I; Camp, Nicola J; Skibola, Christine F et al. (2016) Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat Commun 7:10933
Vijai, Joseph; Wang, Zhaoming; Berndt, Sonja I et al. (2015) A genome-wide association study of marginal zone lymphoma shows association to the HLA region. Nat Commun 6:5751
Trynka, Gosia; Westra, Harm-Jan; Slowikowski, Kamil et al. (2015) Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci. Am J Hum Genet 97:139-52
Hayes, James E; Trynka, Gosia; Vijai, Joseph et al. (2015) Tissue-Specific Enrichment of Lymphoma Risk Loci in Regulatory Elements. PLoS One 10:e0139360
González, Alvaro J; Setty, Manu; Leslie, Christina S (2015) Early enhancer establishment and regulatory locus complexity shape transcriptional programs in hematopoietic differentiation. Nat Genet 47:1249-59

Showing the most recent 10 out of 20 publications