This application requests support for our research activities concerned with problems of identifying and classifying patterns within and between nucleic acid and protein sequences. The emphasis is on developing mathematical, statistical and computing concepts and methods to help assess and interpret molecular sequence features. Our research program will concentrate in six main areas. 1) Development of new statistics, means for assessing statistical significance, and methods of data representations for nucleotide and amino acid sequences that can aid in identifying and interpreting molecular sequence relationships; 2) development of efficient and wide-ranging computer algorithms by which to identify significant word relationships among multiple letter sequences; 3) investigation of the nature of codon and amino acid preferences and patterns with respect to different classifications of genes; 4) extensive analysis of the presence or absence of charge concentrations over different categories of genes in many species and possible interpretations for function and structure; 5) pursuit of new approaches for phylogenetic constructions using partial ordering criteria and concepts of consensus trees; and 6) specific comparative sequence analyses to include detailed studies on (a) multigene families (globins, immunoglobulins), (b) the Herpes virus family especially cross studies of the complete Epstein Barr virus and the Varicello Zoster virus genomes, and (c) investigation of the conjunction of a set of retroviruses, hepadnoviruses, and transposon elements. The interplay between theoretical analysis, data analysis, computer algorithms, and interaction with biologists and medical faculty at Stanford has been a key factor in our program. The unique collaboration between our group and members of the biology and medical departments provides an ideal framework for achieving the research objectives defined in this grant.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
8R01HG000335-03
Application #
3333459
Study Section
Special Emphasis Panel (SSS (F))
Project Start
1988-08-01
Project End
1993-07-31
Budget Start
1990-08-01
Budget End
1991-07-31
Support Year
3
Fiscal Year
1990
Total Cost
Indirect Cost
Name
Stanford University
Department
Type
Schools of Arts and Sciences
DUNS #
800771545
City
Stanford
State
CA
Country
United States
Zip Code
94305
Karlin, Samuel; Theriot, Julie; Mrazek, Jan (2004) Comparative analysis of gene expression among low G+C gram-positive genomes. Proc Natl Acad Sci U S A 101:6182-7
Karlin, Samuel; Barnett, Melanie J; Campbell, Allan M et al. (2003) Predicting gene expression levels from codon biases in alpha-proteobacterial genomes. Proc Natl Acad Sci U S A 100:7313-8
Mrazek, Jan; Gaynon, Lisa H; Karlin, Samuel (2002) Frequent oligonucleotide motifs in genomes of three streptococci. Nucleic Acids Res 30:4216-21
Karlin, Samuel; Chen, Chingfer; Gentles, Andrew J et al. (2002) Associations between human disease genes and overlapping gene groups and multiple amino acid runs. Proc Natl Acad Sci U S A 99:17008-13
Ma, Jiong; Campbell, Allan; Karlin, Samuel (2002) Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures. J Bacteriol 184:5733-45
Karlin, Samuel; Brocchieri, Luciano; Bergman, Aviv et al. (2002) Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci U S A 99:333-8
Chen, Chingfer; Gentles, Andrew J; Jurka, Jerzy et al. (2002) Genes, pseudogenes, and Alu sequence organization across human chromosomes 21 and 22. Proc Natl Acad Sci U S A 99:2930-5
Karlin, Samuel; Brocchieri, Luciano; Trent, Jonathan et al. (2002) Heterogeneity of genome and proteome content in bacteria, archaea, and eukaryotes. Theor Popul Biol 61:367-90
Mrazek, J; Bhaya, D; Grossman, A R et al. (2001) Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Res 29:1590-601
Karlin, S; Mrazek, J; Campbell, A et al. (2001) Characterizations of highly expressed genes of four fast-growing bacteria. J Bacteriol 183:5025-40

Showing the most recent 10 out of 74 publications