It is estimated that there are approximately 80,000 genes in the human genome (Fields C., et al. 1994). To turn this genetic blueprint into a functional organism, genes must be expressed in a specific temporal and spatial pattern. Finding signals that control this expression and understanding their language is one of the major challenges of the post- genome era. Laboratory identification of regulatory elements, modules, and regions in genomic sequences is often an arduous, time-consuming, and expensive process. If specific approaches can be developed, computational analyses promise to accelerate this process at minimal cost. The long term goal of the proposed research is to develop and apply Bayesian bioinformatics computational methods which will describe the complete wiring diagram for a genome's transcription regulation system. This description will include four components: 1) the identification of all superfamilies of transcription factors and their classification into functionally related subclasses based on both the DNA recognition motifs and the activator domains; 2) the identification and characterization of a genome's transcriptional regulatory modules and all factor binding elements within them; 3) the full delineation of the connections between factors and their binding elements; 4) a characterization of alternative transcriptional regulatory motifs, including those based on DNA composition, and DNA and RNA structure. These goals will be addressed using Bayesian statistical models and algorithms, the foundations for which we developed during the current award period. These include Gibbs sampling algorithms to assembly superfamilies of transcription factors and multiply align them, transcription factor classification algorithms, exact Bayesian algorithms for the description of compositional and structural heterogeneity, RNA secondary structure, and phylogenetic footprinting, and recursive Gibbs sampling HMM for regulatory module identification and characterization.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG001257-08
Application #
6693010
Study Section
Special Emphasis Panel (ZRG1-GNM (03))
Program Officer
Good, Peter J
Project Start
1995-09-20
Project End
2004-08-31
Budget Start
2004-01-01
Budget End
2004-08-31
Support Year
8
Fiscal Year
2004
Total Cost
$489
Indirect Cost
Name
Wadsworth Center
Department
Type
DUNS #
153695478
City
Menands
State
NY
Country
United States
Zip Code
12204
Newberg, Lee A; Lawrence, Charles E (2009) Exact calculation of distributions on integers, with application to sequence alignment. J Comput Biol 16:1-18
Webb-Robertson, Bobbie-Jo M; McCue, Lee Ann; Lawrence, Charles E (2008) Measuring global credibility with application to local sequence alignment. PLoS Comput Biol 4:e1000077
Carvalho, Luis E; Lawrence, Charles E (2008) Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc Natl Acad Sci U S A 105:3209-14
Newberg, Lee A; Thompson, William A; Conlan, Sean et al. (2007) A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23:1718-27
Thompson, William A; Newberg, Lee A; Conlan, Sean et al. (2007) The Gibbs Centroid Sampler. Nucleic Acids Res 35:W232-7
Ding, Ye; Chan, Chi Yu; Lawrence, Charles E (2006) Clustering of RNA secondary structures with application to messenger RNAs. J Mol Biol 359:554-71
Conlan, Sean; Lawrence, Charles; McCue, Lee Ann (2005) Rhodopseudomonas palustris regulons detected by cross-species analysis of alphaproteobacterial genomes. Appl Environ Microbiol 71:7442-52
Chan, Chi Yu; Lawrence, Charles E; Ding, Ye (2005) Structure clustering features on the Sfold Web server. Bioinformatics 21:3926-8
Thompson, William; McCue, Lee Ann; Lawrence, Charles E (2005) Using the Gibbs motif sampler to find conserved domains in DNA and protein sequences. Curr Protoc Bioinformatics Chapter 2:Unit 2.8
Newberg, Lee A; McCue, Lee Ann; Lawrence, Charles E (2005) The relative inefficiency of sequence weights approaches in determining a nucleotide position weight matrix. Stat Appl Genet Mol Biol 4:Article13

Showing the most recent 10 out of 30 publications