While genome-wide association studies (GWAS) have identified over 3000 loci associated with common disease, the mechanism by which variation at these loci are pathogenic remains unclear. The ENCODE project provides a unique resource of extensive functional genomic data that can be used to close this knowledge gap. The overall objective of this application is to develop computational methods to integrate data from the ENCODE project with GWAS data to predict simultaneously the relevant tissue type and functionally important variants for a given disease. A secondary objective is to validate these approaches through the analysis of data on cancer and autoimmune disease. The central hypothesis is that loci identified through GWAS tag functional SNPs that cause disease by altering transcription factor binding sites, thereby dysregulating genes in the relevant tissue typ (e.g. pre-neoplastic tissue for cancer and immune cells for autoimmune disease). The rationale that underlies this research is that the ENCODE data represents a rich resource for understanding GWAS results and that the methods to be developed here will enable similar analysis on other diseases. The research team is well prepared to undertake the proposed research because of their combined expertise in the conduct and analysis of GWAS, functional and bioinformatics follow-up of GWAS hits, and machine learning approaches to understanding genome-scale data including transcription factor binding. The central hypothesis will be tested through these aims: 1) Determine if ENCODE data generated in the appropriate tissue type can be used to find putative functional transcriptional regulatory variants at disease-associated loci. This will be achieved by asking if lymphoma risk SNPs tend to alter transcription factor binding sites and associate with expression of nearby genes in lymphoblastoid cell lines. 2) Using ENCODE data, identify the cell types and tissue(s) important for a given disease. Relevant cell types will be identified by determining those cell types in which genes are more likely to be expressed near disease risk loci in the ENCODE data. 3) Identify putative functional SNPs in GWAS using ENCODE when complete functional genomic data is not available for the appropriate tissue type. To extend these analyses beyond the few cell types extensively studied in ENCODE, DNase hypersensitivity data from the relevant tissue will be linked with ChIP-Seq transcription factor binding data from other tissues to allow identification of variants that alter transcription factor binding. This research is significant because it will provide new insight into the biology of cancer and autoimmune disease. More importantly, it will provide the tools necessary to use the ENCODE data to link disease risk loci with functional variants and potential mechanistic explanations.

Public Health Relevance

Over the past five years, numerous genetic changes associated with common disease have been identified. The next step to use this information to improve human health is to determine how these changes alter cellular function. Here, methods to integrate data from the ENCODE project with these genetic studies are proposed to enable investigators to generate hypotheses regarding the function of these genetic changes.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (M2))
Program Officer
Gilchrist, Daniel A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Icahn School of Medicine at Mount Sinai
Schools of Medicine
New York
United States
Zip Code
Pelossof, Raphael; Fairchild, Lauren; Huang, Chun-Hao et al. (2017) Prediction of potent shRNAs with a sequential classification algorithm. Nat Biotechnol 35:350-353
Perez, Alexendar R; Pritykin, Yuri; Vidigal, Joana A et al. (2017) GuideScan software for improved single and paired CRISPR guide RNA design. Nat Biotechnol 35:347-349
Law, Philip J; Berndt, Sonja I; Speedy, Helen E et al. (2017) Genome-wide association analysis implicates dysregulation of immunity genes in chronic lymphocytic leukaemia. Nat Commun 8:14175
Kilpeläinen, Tuomas O; Carli, Jayne F Martin; Skowronski, Alicja A et al. (2016) Genome-wide meta-analysis uncovers novel loci influencing circulating leptin levels. Nat Commun 7:10494
Machiela, Mitchell J; Lan, Qing; Slager, Susan L et al. (2016) Genetically predicted longer telomere length is associated with increased risk of B-cell lymphoma subtypes. Hum Mol Genet 25:1663-76
Pouget, Jennie G; Gonçalves, Vanessa F; Schizophrenia Working Group of the Psychiatric Genomics Consortium et al. (2016) Genome-Wide Association Studies Suggest Limited Immune Gene Enrichment in Schizophrenia Compared to 5 Autoimmune Diseases. Schizophr Bull 42:1176-84
Liu, Xiaoming; White, Simon; Peng, Bo et al. (2016) WGSA: an annotation pipeline for human genome sequencing studies. J Med Genet 53:111-2
Yuan, Hua; Liu, Hongliang; Liu, Zhensheng et al. (2016) A Novel Genetic Variant in Long Non-coding RNA Gene NEXN-AS1 is Associated with Risk of Lung Cancer. Sci Rep 6:34234
Berndt, Sonja I; Camp, Nicola J; Skibola, Christine F et al. (2016) Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat Commun 7:10933
Hakimi, A Ari; Ostrovnaya, Irina; Jacobsen, Anders et al. (2016) Validation and genomic interrogation of the MET variant rs11762213 as a predictor of adverse outcomes in clear cell renal cell carcinoma. Cancer 122:402-10

Showing the most recent 10 out of 24 publications