Transcriptional regulation is a highly coordinated process in the human genome. A significant component of transcriptional regulation is the interaction between transcriptional factor proteins (TFs) and cis-regulatory DNA elements. The goal of this project is to computationally predict and experimentally validate DNA sequence motifs that explain promoter function. The results of this project will be direct functional measurements of sequence motifs at base-pair resolution. This will yield extremely valuable information to assess the sensitivity and specificity of algorithms that can be immediately applied to the whole genome. These results will also help to identify the proportion of functionally relevant transcription factor binding events. The three aims of our project are:
Aim 1 : We will use two machine learning algorithms (support vector machines and random forest) to determine a subset of known transcription factor binding motifs that are the most predictive of promoter activities.
Aim 2 : We will then use Bayesian networks to select the most predictive motif features. These features are the strengths of the motif using PSSM and the positions of individual sites relative to each other and the transcription start site.
Aim 3 : We will then perform mutagenesis of the informative positions within the 900 sites identified in Aim 2, and measure their promoter activities by using transient transfection assays. We also plan to test 100 lower ranking sites to determine the sensitivity and specificity of our algorithms. We will also develop an oligo competition assay as a new approach to increase the throughput of experimental motif analysis for the rest of the genome. The data generated in this project will be the first systematic functional analysis of TF binding sites at base-pair resolution. ? ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
7U01HG004561-02
Application #
7615950
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O1))
Program Officer
Good, Peter J
Project Start
2007-09-28
Project End
2010-06-30
Budget Start
2008-01-01
Budget End
2008-06-30
Support Year
2
Fiscal Year
2007
Total Cost
$495,625
Indirect Cost
Name
University of Massachusetts Medical School Worcester
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
603847393
City
Worcester
State
MA
Country
United States
Zip Code
01655
Wang, Jie; Zhuang, Jiali; Iyer, Sowmya et al. (2012) Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 22:1798-812
ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57-74
Hoffman, Michael M; Buske, Orion J; Wang, Jie et al. (2012) Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 9:473-6
Whitfield, Troy W; Wang, Jie; Collins, Patrick J et al. (2012) Functional analysis of transcription factor binding sites in human promoters. Genome Biol 13:R50
Hung, Jui-Hung; Whitfield, Troy W; Yang, Tun-Hsiang et al. (2010) Identification of functional modules that correlate with phenotypic difference: the influence of network topology. Genome Biol 11:R23
Fu, Yutao; Sinha, Manisha; Peterson, Craig L et al. (2008) The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet 4:e1000138