The Analysis of Signal Elements in Promoter Sequences.

Spouge, John

Abstract

The signal elements in promoter sequences are not well characterized. Around 2004, Dr. Mario-Ramrez collected a database of about 4700 sequences around the TSS of human genes, and in 2008, he increased the size of the database by about a factor of 2. We then developed tests based on (maximal segment) score statistics to find nucleotide words (generally of length 8) that appear localized relative to TSSs (transcription start sites). About 80 of these words occurred in two or three clusters. By validating our results with microarray data and gene ontology information, we were able to show that the same 8-letter word could have two different biological functions, depending on its position with respect to the TSS. Although positional dependency of sequence function is now accepted, our study was one of the first to show that it is a widespread phenomenon in the human genome. We implemented our methods, which use positional information and theoretically sound Markov models, in the publicly available program A-GLAM, which was one of the first to find transcription factor binding sites (TFBS) motifs using both sequence and position. Dr. Mario-Ramrez has now increased the size of our database to 29,204 sequences. Dr. Spouge and Ms. Acevedo-Luna have extended the statistical methods for words to known TFBS motifs in the JASPAR database, to categorize JASPAR motifs according to their positional preference, to use positional preference to discover pairs of TFBSs in putative cis-regulatory modules, and to assign function from the Gene Ontology Database. Drs. Kim, Jayatillake, and Spouge have also developed a model for calling peaks in ChIP-seq data, to identify TFBSs from experimental data, and implemented it in a publicly available program (NEXT-Peak).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM091704-11
Application #: 8943237
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 11
Fiscal Year: 2014
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2017 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2016 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2015 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2014 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine
NIH 2013 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$267,102
NIH 2012 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$303,689
NIH 2011 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$99,936
NIH 2010 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$156,696
NIH 2009 ZIA LM	The Analysis of Signal Elements in Promoter Sequences. Spouge, John / National Library of Medicine	$165,855

Publications

Acevedo-Luna, Natalia; Mariño-Ramírez, Leonardo; Halbert, Armand et al. (2016) Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules. BMC Bioinformatics 17:479

Kim, Nak-Kyeong; Jayatillake, Rasika V; Spouge, John L (2013) NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data. BMC Genomics 14:349

Mariño-Ramírez, Leonardo; Tharakaraman, Kannan; Spouge, John L et al. (2009) Promoter analysis: gene regulatory motif identification with A-GLAM. Methods Mol Biol 537:263-76

Kim, Nak-Kyeong; Tharakaraman, Kannan; Marino-Ramirez, Leonardo et al. (2008) Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites. BMC Bioinformatics 9:262

Tharakaraman, Kannan; Bodenreider, Olivier; Landsman, David et al. (2008) The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site. Nucleic Acids Res 36:2777-86

Comments

Be the first to comment on John Spouge's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: