The signal elements in promoter sequences are not well characterized. We developed statistical tests to find nucleotide words (generally of length 8) that appear localized relative to TSSs (transcription start site). These words constituted """"""""seeds"""""""" for expansion to develop PSSMs (position-specific scoring matrices) characterizing systems of co-regulated genes. To this end, Dr. Marino-Ramirez collected a database of about 4700 sequences around the TSS of human genes. The database was exceptionally well characterized, and ideal for our statistical study. We used a Poisson scan statistic to determine whether occurrences of a given 8-letter DNA word are clustered unusually relative to the TSS. The Poisson scan statistic also identified clusters of significant words. We have developed a database of positionally significant clusters and a Gibbs sampling program, A-GLAM, to further our exploration of transcriptional regulatory elements using anchored alignments. A-GLAM also now includes a post-processing step to find multiple instances of a transcriptional binding element in a single sequence. We are beginning evaluation of Bayesian sampling methods to incorporate positional information into A-GLAM's analysis. We are also validating our results with microarray data and gene ontology information.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM091704-03
Application #
7316277
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2006
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Tharakaraman, Kannan; Marino-Ramirez, Leonardo; Sheetlin, Sergey L et al. (2006) Scanning sequences after Gibbs sampling to find multiple occurrences of functional elements. BMC Bioinformatics 7:408
Kim, Nak-Kyeong; Tharakaraman, Kannan; Spouge, John L (2006) Adding sequence context to a Markov background model improves the identification of regulatory elements. Bioinformatics 22:2870-5
Tharakaraman, Kannan; Marino-Ramirez, Leonardo; Sheetlin, Sergey et al. (2005) Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics 21 Suppl 1:i440-8
Marino-Ramirez, Leonardo; Spouge, John L; Kanga, Gavin C et al. (2004) Statistical analysis of over-represented words in human promoter sequences. Nucleic Acids Res 32:949-58
Frith, Martin C; Hansen, Ulla; Spouge, John L et al. (2004) Finding functional sequence elements by multiple local alignment. Nucleic Acids Res 32:189-200