While genome-wide association studies (GWAS) have identified over 3000 loci associated with common disease, the mechanism by which variation at these loci are pathogenic remains unclear. The ENCODE project provides a unique resource of extensive functional genomic data that can be used to close this knowledge gap. The overall objective of this application is to develop computational methods to integrate data from the ENCODE project with GWAS data to predict simultaneously the relevant tissue type and functionally important variants for a given disease. A secondary objective is to validate these approaches through the analysis of data on cancer and autoimmune disease. The central hypothesis is that loci identified through GWAS tag functional SNPs that cause disease by altering transcription factor binding sites, thereby dysregulating genes in the relevant tissue typ (e.g. pre-neoplastic tissue for cancer and immune cells for autoimmune disease). The rationale that underlies this research is that the ENCODE data represents a rich resource for understanding GWAS results and that the methods to be developed here will enable similar analysis on other diseases. The research team is well prepared to undertake the proposed research because of their combined expertise in the conduct and analysis of GWAS, functional and bioinformatics follow-up of GWAS hits, and machine learning approaches to understanding genome-scale data including transcription factor binding. The central hypothesis will be tested through these aims: 1) Determine if ENCODE data generated in the appropriate tissue type can be used to find putative functional transcriptional regulatory variants at disease-associated loci. This will be achieved by asking if lymphoma risk SNPs tend to alter transcription factor binding sites and associate with expression of nearby genes in lymphoblastoid cell lines. 2) Using ENCODE data, identify the cell types and tissue(s) important for a given disease. Relevant cell types will be identified by determining those cell types in which genes are more likely to be expressed near disease risk loci in the ENCODE data. 3) Identify putative functional SNPs in GWAS using ENCODE when complete functional genomic data is not available for the appropriate tissue type. To extend these analyses beyond the few cell types extensively studied in ENCODE, DNase hypersensitivity data from the relevant tissue will be linked with ChIP-Seq transcription factor binding data from other tissues to allow identification of variants that alter transcription factor binding. This research is significant because it will provide new insight into the biology of cancer and autoimmune disease. More importantly, it will provide the tools necessary to use the ENCODE data to link disease risk loci with functional variants and potential mechanistic explanations.

Public Health Relevance

Over the past five years, numerous genetic changes associated with common disease have been identified. The next step to use this information to improve human health is to determine how these changes alter cellular function. Here, methods to integrate data from the ENCODE project with these genetic studies are proposed to enable investigators to generate hypotheses regarding the function of these genetic changes.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (M2))
Program Officer
Gilchrist, Daniel A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Icahn School of Medicine at Mount Sinai
Schools of Medicine
New York
United States
Zip Code
Han, Buhm; Kang, Eun Yong; Raychaudhuri, Soumya et al. (2014) Fast pairwise IBD association testing in genome-wide association studies. Bioinformatics 30:206-13
Xu, Jin; Lu, Zhigang; Xu, Mingming et al. (2014) A heroin addiction severity-associated intronic single nucleotide polymorphism modulates alternative pre-mRNA splicing of the ? opioid receptor gene OPRM1 via hnRNPH interactions. J Neurosci 34:11048-66
Cerhan, James R; Berndt, Sonja I; Vijai, Joseph et al. (2014) Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma. Nat Genet 46:1233-8