The Analysis of Signal Elements in Promoter Sequences.

Spouge, John

Abstract

The signal elements in promoter sequences are not well characterized. Around 2004, Dr. Mario-Ramrez collected a database of about 4700 sequences around the TSS of human genes, and in 2008, he increased the size of the database by about a factor of 2. We then developed tests based on (maximal segment) score statistics to find nucleotide words (generally of length 8) that appear localized relative to TSSs (transcription start sites). About 80 of these words occurred in two or three clusters. By validating our results with microarray data and gene ontology information, we were able to show that the same 8-letter word could have two different biological functions, depending on its position with respect to the TSS. Although positional dependency of sequence function is a known phenomenon, our study showed that it is widespread in the human genome. We implemented our methods, which use positional information and theoretically sound Markov models, in the publicly available program A-GLAM, which finds transcription factor binding sites (TFBS) motifs. Drs. Mario-Ramrez and Spouge and Ms. Acevedo-Luna have extended the statistical methods for words to known TFBS motifs in the JASPAR database, to categorize JASPAR motifs according to their positional preference, to use positional preference to discover combinations of TFBSs in putative cis-regulatory modules, and to assign function from the Gene Ontology Database. A manuscript in preparation demonstrates that position relative to the transcription start site influences the function of transcription factor binding motifs. Drs. Kim, Jayatillake, and Spouge have also developed a model for calling peaks in ChIP-seq data, to identify TFBSs from experimental data, implemented it in a publicly available program (NEXT-Peak). They are currently extending their work to include the possibility of several peaks in a short sequence of DNA.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM091704-10
Application #: 8746748
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 10
Fiscal Year: 2013
Total Cost: $267,102
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2017 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2016 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2015 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2014 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2013 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$267,102
NIH 2012 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$303,689
NIH 2011 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$99,936
NIH 2010 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$156,696
NIH 2009 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$165,855

Publications

Acevedo-Luna, Natalia; Mariño-Ramírez, Leonardo; Halbert, Armand et al. (2016) Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules. BMC Bioinformatics 17:479

Kim, Nak-Kyeong; Jayatillake, Rasika V; Spouge, John L (2013) NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data. BMC Genomics 14:349

Mariño-Ramírez, Leonardo; Tharakaraman, Kannan; Spouge, John L et al. (2009) Promoter analysis: gene regulatory motif identification with A-GLAM. Methods Mol Biol 537:263-76

Kim, Nak-Kyeong; Tharakaraman, Kannan; Marino-Ramirez, Leonardo et al. (2008) Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites. BMC Bioinformatics 9:262

Tharakaraman, Kannan; Bodenreider, Olivier; Landsman, David et al. (2008) The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site. Nucleic Acids Res 36:2777-86

Comments

Be the first to comment on John Spouge's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: