Only 3% of human genome is expected to code for proteins. The remaining 97% has been called """"""""junk DNA"""""""". As highlighted in research news articles, several recent results indicated that there may be hidden treasure in this """"""""junk"""""""". The highlighted findings include the discovery of regulatory signals in minisatellites, novel 3' untranslated region (3'UTR) RNA binding motifs is C-elegant Lin-14 genes and novel RNA secondary structure regulatory motifs in 5'UTRs of tRNA synthetase genes of gram-positive bacteria. The novelty of these findings leave open two important questions. Are these """"""""grotesque deviants"""""""" or """"""""first emissaries""""""""? And if they are emissaries, how can their world be discovered? If these recent findings are emissaries, then their identification and characterization ill have a major impact on the next phase of the human genome project, and the numerous health benefits expected to be derived from this project. We recently described a novel method for the detection of subtle sequence signals and its application to protein sequence alignment (Lawrence, et al., 1993). The strength of this method rests on the sampling models based on the physicochemical characteristics of macromolecules and complexes. We have previously applied the predecessor of this method to the identification of gene regulation elements, but we have only begun to exploit its full potential. The main coal of this research is to adapt these methods for the identification and characterization of novel sequence signals in the non-coding regions of genomes. Specifically, we plan to adapt these methods through three developments: l) sampling models that focus on characteristics of DNA interactions in complex contexts; 2) sampling models based on the energetics of RNA/RNA interaction; and 3) a set of universally applicable enhancements. To achieve these ends, we will develop and distribute a software system for the identification and characterization of subtle sequence signals in non-coding regions of genomes.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG001257-03
Application #
2519133
Study Section
Special Emphasis Panel (ZRG2-GNM (02))
Project Start
1995-09-20
Project End
1999-12-31
Budget Start
1997-09-01
Budget End
1999-12-31
Support Year
3
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Wadsworth Center
Department
Type
DUNS #
110521739
City
Menands
State
NY
Country
United States
Zip Code
12204
Newberg, Lee A; Lawrence, Charles E (2009) Exact calculation of distributions on integers, with application to sequence alignment. J Comput Biol 16:1-18
Webb-Robertson, Bobbie-Jo M; McCue, Lee Ann; Lawrence, Charles E (2008) Measuring global credibility with application to local sequence alignment. PLoS Comput Biol 4:e1000077
Carvalho, Luis E; Lawrence, Charles E (2008) Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc Natl Acad Sci U S A 105:3209-14
Newberg, Lee A; Thompson, William A; Conlan, Sean et al. (2007) A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23:1718-27
Thompson, William A; Newberg, Lee A; Conlan, Sean et al. (2007) The Gibbs Centroid Sampler. Nucleic Acids Res 35:W232-7
Ding, Ye; Chan, Chi Yu; Lawrence, Charles E (2006) Clustering of RNA secondary structures with application to messenger RNAs. J Mol Biol 359:554-71
Conlan, Sean; Lawrence, Charles; McCue, Lee Ann (2005) Rhodopseudomonas palustris regulons detected by cross-species analysis of alphaproteobacterial genomes. Appl Environ Microbiol 71:7442-52
Chan, Chi Yu; Lawrence, Charles E; Ding, Ye (2005) Structure clustering features on the Sfold Web server. Bioinformatics 21:3926-8
Thompson, William; McCue, Lee Ann; Lawrence, Charles E (2005) Using the Gibbs motif sampler to find conserved domains in DNA and protein sequences. Curr Protoc Bioinformatics Chapter 2:Unit 2.8
Newberg, Lee A; McCue, Lee Ann; Lawrence, Charles E (2005) The relative inefficiency of sequence weights approaches in determining a nucleotide position weight matrix. Stat Appl Genet Mol Biol 4:Article13

Showing the most recent 10 out of 30 publications