To realize the promise of the human genome project, we need not only the parts list of all the genes, but also a comprehensive understanding of how they function together. Along with genes, our genome contains all the signals necessary for controlling gene expression in response to environmental and developmental stimuli. These regulatory processes are governed by short sequence motifs, responsible for modulating gene usage at every level. Despite their prevalence, regulatory motifs have been particularly challenging to identify, due to their short length and the varying distances at which they can act. Given their extraordinary importance, their systematic understanding still remains one of the major challenges of modern biology. In the proposed work, we use comparative genomics of multiple mammals to systematically identify and characterize regulatory motifs in the human genome based on their evolutionary conservation. We have pioneered a new powerful approach for de novo motif discovery by using genome-wide conservation, and successfully applied it in four yeast genomes, twelve fly genomes, and human promoters and 3'-UTRs. Here we expand this methodology to undertake motif discovery across the entire human genome: (1) we develop methods that use dozens of mammalian species for motif discovery and characterization; (2) we identify significant motif combinations and grammars and reveal their functional roles; and (3) we discover functional regions of motif clustering and study motif role in specifying enhancer function. The proposed work is timely, given that NHGRI's sequencing efforts now encompass more than 30 mammalian genomes, specifically for understanding the human. Moreover, large-scale systematic experimentation is providing the functional information necessary to inform and validate our findings. By revealing the underlying sequence patterns that govern gene usage, we complement these ongoing efforts and provide access to the concrete building blocks of human gene regulation. This will enable researchers world-wide to link new genes in pathways by their co-regulation, elucidate the role of non- coding SNPs in regulatory diseases, and lead to new tests and therapeutics for modern medicine. A global map of regulatory motifs constitutes a necessary knowledge infrastructure towards a comprehensive understanding of regulation, development, and disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004037-09
Application #
8920441
Study Section
Special Emphasis Panel (ZRG1-GGG-D (90))
Program Officer
Pazin, Michael J
Project Start
2007-09-28
Project End
2015-10-31
Budget Start
2015-09-01
Budget End
2015-10-31
Support Year
9
Fiscal Year
2015
Total Cost
$55,688
Indirect Cost
$19,990
Name
Massachusetts Institute of Technology
Department
Type
Organized Research Units
DUNS #
001425594
City
Cambridge
State
MA
Country
United States
Zip Code
02139
Loughran, Gary; Jungreis, Irwin; Tzani, Ioanna et al. (2018) Stop codon readthrough generates a C-terminally extended variant of the human vitamin D receptor with reduced calcitriol response. J Biol Chem 293:4434-4444
Ernst, Jason; Melnikov, Alexandre; Zhang, Xiaolan et al. (2016) Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat Biotechnol 34:1180-1190
Jungreis, Irwin; Chan, Clara S; Waterhouse, Robert M et al. (2016) Evolutionary Dynamics of Abundant Stop Codon Readthrough. Mol Biol Evol 33:3108-3132
Wang, Xinchen; Tucker, Nathan R; Rizki, Gizem et al. (2016) Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. Elife 5:
Marbach, Daniel; Lamparter, David; Quon, Gerald et al. (2016) Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods 13:366-70
Ward, Lucas D; Kellis, Manolis (2016) HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res 44:D877-81
Bekelis, Kimon; Kerley-Hamilton, Joanna S; Teegarden, Amy et al. (2016) MicroRNA and gene expression changes in unruptured human cerebral aneurysms. J Neurosurg 125:1390-1399
Ma, Jiao; Diedrich, Jolene K; Jungreis, Irwin et al. (2016) Improved Identification and Analysis of Small Open Reading Frame Encoded Polypeptides. Anal Chem 88:3967-75
Chibnik, Lori B; Yu, Lei; Eaton, Matthew L et al. (2015) Alzheimer's loci: epigenetic associations and interaction with genetic factors. Ann Clin Transl Neurol 2:636-47
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak et al. (2015) Context influences on TALE-DNA binding revealed by quantitative profiling. Nat Commun 6:7440

Showing the most recent 10 out of 101 publications