It is estimated that there are approximately 80,000 genes in the human genome (Fields C., et al. 1994). To turn this genetic blueprint into a functional organism, genes must be expressed in a specific temporal and spatial pattern. Finding signals that control this expression and understanding their language is one of the major challenges of the post- genome era. Laboratory identification of regulatory elements, modules, and regions in genomic sequences is often an arduous, time-consuming, and expensive process. If specific approaches can be developed, computational analyses promise to accelerate this process at minimal cost. The long term goal of the proposed research is to develop and apply Bayesian bioinformatics computational methods which will describe the complete wiring diagram for a genome's transcription regulation system. This description will include four components: 1) the identification of all superfamilies of transcription factors and their classification into functionally related subclasses based on both the DNA recognition motifs and the activator domains; 2) the identification and characterization of a genome's transcriptional regulatory modules and all factor binding elements within them; 3) the full delineation of the connections between factors and their binding elements; 4) a characterization of alternative transcriptional regulatory motifs, including those based on DNA composition, and DNA and RNA structure. These goals will be addressed using Bayesian statistical models and algorithms, the foundations for which we developed during the current award period. These include Gibbs sampling algorithms to assembly superfamilies of transcription factors and multiply align them, transcription factor classification algorithms, exact Bayesian algorithms for the description of compositional and structural heterogeneity, RNA secondary structure, and phylogenetic footprinting, and recursive Gibbs sampling HMM for regulatory module identification and characterization.
Newberg, Lee A; Lawrence, Charles E (2009) Exact calculation of distributions on integers, with application to sequence alignment. J Comput Biol 16:1-18 |
Webb-Robertson, Bobbie-Jo M; McCue, Lee Ann; Lawrence, Charles E (2008) Measuring global credibility with application to local sequence alignment. PLoS Comput Biol 4:e1000077 |
Carvalho, Luis E; Lawrence, Charles E (2008) Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc Natl Acad Sci U S A 105:3209-14 |
Thompson, William A; Newberg, Lee A; Conlan, Sean et al. (2007) The Gibbs Centroid Sampler. Nucleic Acids Res 35:W232-7 |
Newberg, Lee A; Thompson, William A; Conlan, Sean et al. (2007) A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23:1718-27 |
Ding, Ye; Chan, Chi Yu; Lawrence, Charles E (2006) Clustering of RNA secondary structures with application to messenger RNAs. J Mol Biol 359:554-71 |
Conlan, Sean; Lawrence, Charles; McCue, Lee Ann (2005) Rhodopseudomonas palustris regulons detected by cross-species analysis of alphaproteobacterial genomes. Appl Environ Microbiol 71:7442-52 |
Chan, Chi Yu; Lawrence, Charles E; Ding, Ye (2005) Structure clustering features on the Sfold Web server. Bioinformatics 21:3926-8 |
Thompson, William; McCue, Lee Ann; Lawrence, Charles E (2005) Using the Gibbs motif sampler to find conserved domains in DNA and protein sequences. Curr Protoc Bioinformatics Chapter 2:Unit 2.8 |
Newberg, Lee A; McCue, Lee Ann; Lawrence, Charles E (2005) The relative inefficiency of sequence weights approaches in determining a nucleotide position weight matrix. Stat Appl Genet Mol Biol 4:Article13 |
Showing the most recent 10 out of 30 publications