To realize the promise of the human genome project, we need not only the parts list of all the genes, but also a comprehensive understanding of how they function together. Along with genes, our genome contains all the signals necessary for controlling gene expression in response to environmental and developmental stimuli. These regulatory processes are governed by short sequence motifs, responsible for modulating gene usage at every level. Despite their prevalence, regulatory motifs have been particularly challenging to identify, due to their short length and the varying distances at which they can act. Given their extraordinary importance, their systematic understanding still remains one of the major challenges of modern biology. In the proposed work, we use comparative genomics of multiple mammals to systematically identify and characterize regulatory motifs in the human genome based on their evolutionary conservation. We have pioneered a new powerful approach for de novo motif discovery by using genome-wide conservation, and successfully applied it in four yeast genomes, twelve fly genomes, and human promoters and 3'-UTRs. Here we expand this methodology to undertake motif discovery across the entire human genome: (1) we develop methods that use dozens of mammalian species for motif discovery and characterization;(2) we identify significant motif combinations and grammars and reveal their functional roles;and (3) we discover functional regions of motif clustering and study motif role in specifying enhancer function. The proposed work is timely, given that NHGRI's sequencing efforts now encompass more than 30 mammalian genomes, specifically for understanding the human. Moreover, large-scale systematic experimentation is providing the functional information necessary to inform and validate our findings. By revealing the underlying sequence patterns that govern gene usage, we complement these ongoing efforts and provide access to the concrete building blocks of human gene regulation. This will enable researchers world-wide to link new genes in pathways by their co-regulation, elucidate the role of non- coding SNPs in regulatory diseases, and lead to new tests and therapeutics for modern medicine. A global map of regulatory motifs constitutes a necessary knowledge infrastructure towards a comprehensive understanding of regulation, development, and disease.
Showing the most recent 10 out of 101 publications