The broad goal of this project is to develop and apply computational tools for detecting, modeling and understanding biologically important sequence patterns, called motifs, encoded in the genome, in RNA and in proteins. Sequence motifs carry much of the information essential to the correct functioning of cells. For ex- ample, motifs in genomic DNA contain information that helps to regulate gene expression. Sequence motifs in RNA encode splice junctions and regulatory information such as microRNA binding sites. At the protein level, sequence motifs may participate in enzymatic binding sites, provide anchors for a protein structure or mediate posttranslational modifications such as phosphorylation by kinases. We model biological sequence patterns using statistical models that capture local sequence patterns while allowing for naturally occurring variability. Since 2011 more than 33,000 unique users have accessed the MEME Suite web portal, and the number of users has been steadily growing. As of June 28, 2013, the papers describing the MEME Suite have been cited 6827 times, according to Google scholar. In the proposed project, we aim to improve the core algorithms in the MEME Suite, add significant new functionality to the Suite, and improve the robustness, reliability and usability of software. In particular, we will significantly enhance the core motif discovery algorithm to scale to larger data sets, to identify new types of motifs, and t provide more accurate statistical confidence estimates. We will add functionality to the Suite to allow users to identify and characterize motifs associated with post-translational protein modifications. We will also carry out a series of software engineering and usability improvements that will greatly enhance the overall user experience. Our software can be locally installed or run remotely through our web portal to perform a diverse set of analyses on large, complex genomic and proteomic data sets. It is in widespread use by scientists around the world.
We aim to continue to maintain and develop this software, facilitating scientific discovery and leading to insights into a wide spectrum of fundamental processes in molecular biology and human disease.

Public Health Relevance

This project will improve the existing, widely-used MEME Suite software that enables biologists to discover and understand how nature uses patterns, called motifs, in DNA, RNA and protein molecules. Identifying and accurately characterizing functional motifs allows scientists to understand how genes are turned on and off and how proteins carry out their functions in the cell. Such knowledge will help us build models of the basic molecular mechanisms of the cell, and in particular, to build molecular-scale models of disease processes.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM103544-10
Application #
8828716
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2009-09-28
Project End
2018-03-31
Budget Start
2015-04-01
Budget End
2016-03-31
Support Year
10
Fiscal Year
2015
Total Cost
$358,109
Indirect Cost
$82,055
Name
University of Washington
Department
Genetics
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Overman, Jeroen; Fontaine, Frank; Moustaqil, Mehdi et al. (2017) Pharmacological targeting of the transcription factor SOX18 delays breast cancer in mice. Elife 6:
Ilsley, Melissa D; Gillinder, Kevin R; Magor, Graham W et al. (2017) Krüppel-like factors compete for promoters and enhancers to fine-tune transcription. Nucleic Acids Res 45:6572-6588
Grant, Charles E; Johnson, James; Bailey, Timothy L et al. (2016) MCAST: scanning for cis-regulatory motif clusters. Bioinformatics 32:1217-9
O'Connor, Timothy; Bodén, Mikael; Bailey, Timothy L (2016) CisMapper: predicting regulatory interactions from transcription factor ChIP-seq data. Nucleic Acids Res :
Gillinder, Kevin R; Ilsley, Melissa D; Nébor, Danitza et al. (2016) Promiscuous DNA-binding of a mutant zinc finger protein corrupts the transcriptome and diminishes cell viability. Nucleic Acids Res :
Bailey, Timothy L; Johnson, James; Grant, Charles E et al. (2015) The MEME Suite. Nucleic Acids Res 43:W39-49
Lim, Jonathan W C; Donahoo, Amber-Lee S; Bunt, Jens et al. (2015) EMX1 regulates NRP1-mediated wiring of the mouse anterior cingulate cortex. Development 142:3746-57
Ma, Wenxiu; Noble, William S; Bailey, Timothy L (2014) Motif-based analysis of large nucleotide data sets using MEME-ChIP. Nat Protoc 9:1428-50
Lesluyes, Tom; Johnson, James; Machanick, Philip et al. (2014) Differential motif enrichment analysis of paired ChIP-seq experiments. BMC Genomics 15:752
Tanaka, Emi; Bailey, Timothy L; Keich, Uri (2014) Improving MEME via a two-tiered significance analysis. Bioinformatics 30:1965-73

Showing the most recent 10 out of 13 publications