This proposal presents a combination of high-throughput experimental and computational approaches centered on the identification of functional transcriptional elements in the human genome. The proposed research is consistent with ENCODE's requirement for comprehensive analyses of all target regions, and will be scalable to the whole genome level. In addition, the results can easily be transferred to the ENCODE database and other public databases for sharing with the other participants of the consortium.
In Aim 1, promoters of all the genes in the ENCODE targets will be comprehensively identified and tested for activity by a screening scheme and a selection scheme.
In Aim 2, a powerful selection method will be used to identify enhancers in the target regions, and their properties will be characterized.
In Aim 3, a combination of chromatin immunoprecipitation and microarray hybridization will be used to identify cis-acting binding sites that are occupied by 12 general transcription factors and chromatin proteins in 24 human cell lines. Sequences of bound segments, along with sequences from promoters and enhancers identified in Aims 1 and 2, will be examined intensively with statistical computational methods to identify motifs that are the likely recognition sites for the proteins.
In Aim 4, all the pan-mammalian DNA elements that are evolutionarily constrained in the target regions will be identified by using advanced computational, evolutionary, and statistical methodology, producing not only a complete parts list, but also high-resolution, quantitative estimates of evolutionary constraint within each element.
In Aim 5, human variation will be identified m a comprehensive sample of the most constrained functional elements by determining the sequence of the element in 48 humans from diverse backgrounds. This sequence data will help answer the question of whether SNPs in noncoding functional elements are as important for human genetics as are cSNPs, and whether the importance of an element for human genetics can be predicted on the basis of evolutionary comparisons. This proposed work constitutes a comprehensive characterization of an important subset of all functional elements in the target regions.
Mortazavi, Ali; Schwarz, Erich M; Williams, Brian et al. (2010) Scaffolding a Caenorhabditis nematode genome with RNA-seq. Genome Res 20:1740-7 |
Valouev, Anton; Johnson, David S; Sundquist, Andreas et al. (2008) Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods 5:829-34 |
Srinivasan, Balaji S; Shah, Nigam H; Flannick, Jason A et al. (2007) Current progress in network research: toward reference networks for key model organisms. Brief Bioinform 8:318-32 |
Lin, Jane M; Collins, Patrick J; Trinklein, Nathan D et al. (2007) Transcription factor binding and modified histones in human bidirectional promoters. Genome Res 17:818-27 |
Trinklein, Nathan D; Karaoz, Ulas; Wu, Jiaqian et al. (2007) Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. Genome Res 17:720-31 |
Johnson, David S; Mortazavi, Ali; Myers, Richard M et al. (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316:1497-502 |
Cooper, Sara J; Trinklein, Nathan D; Nguyen, Loan et al. (2007) Serum response factor binding sites differ in three human cell types. Genome Res 17:136-44 |
(2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799-816 |
Collins, Patrick J; Kobayashi, Yuya; Nguyen, Loan et al. (2007) The ets-related transcription factor GABP directs bidirectional transcription. PLoS Genet 3:e208 |
Fratkin, Eugene; Naughton, Brian T; Brutlag, Douglas L et al. (2006) MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 22:e150-7 |
Showing the most recent 10 out of 18 publications