The accumulation of molecular sequence data is proceeding at an unprecedented pace. Dozens of complete genomes, tens of thousands of proteins, and several hundred distinct nucleic acid and protein structures are now available. The next phase of molecular biology will be increasingly dominated by efforts to characterize, categorize, and analyze these data with the goal of understanding on a molecular basis the content of information and its transfer in biological systems. Our proposal is aimed at achieving a deeper understanding of genome structure, function, and evolution using empirical, descriptive and interactive statistical and computational methods. We focus on four interrelated primary areas. I. Genome signature and evolutionary relationships. We will continue the evaluation of genome-wide differences and similarities within and among species using the dinucleotide relative abundances as a genome signature. Applications of dinucleotide relative abundance profiles to genome comparisons do not require alignment. II. Genomic codon usage patterns. Detailed knowledge of codon and residue choices can help in gene prediction, in characterizing properties of a given gene, and in classifying gene families. In conjunction with the previous area, I, we propose new ways of probing constraints on codon usage that have implications for evolution, DNA structure, and vector design. III. Pairwise and multiple alignments of protein sequences. Multiple alignments achieved by our new methods are interpreted with respect to functional/structural properties and evolution. These alignments will be applied broadly. IV. Statistical methods for genome analysis. We will seek characterizations of genomic heterogeneity within and among species and will seek extensions that accommodate inhomogeneities for r-scan statistics that assess anomalies in the distribution of specific relevant markers along biomolecular sequences. We will further investigate rare and frequent words, motifs, or compositional biases. Finally, we will continue the development of versatile code that implements all our computational and statistical methods for sequence analysis.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM010452-38
Application #
6518796
Study Section
Genetics Study Section (GEN)
Program Officer
Eckstrand, Irene A
Project Start
1979-01-01
Project End
2003-02-28
Budget Start
2002-03-01
Budget End
2003-02-28
Support Year
38
Fiscal Year
2002
Total Cost
$274,817
Indirect Cost
Name
Stanford University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
800771545
City
Stanford
State
CA
Country
United States
Zip Code
94305
Hao, Bingtao; Naik, Abani Kanta; Watanabe, Akiko et al. (2015) An anti-silencer- and SATB1-dependent chromatin hub regulates Rag1 and Rag2 gene expression during thymocyte development. J Exp Med 212:809-24
Macario, Alberto J L; Brocchieri, Luciano; Shenoy, Avinash R et al. (2006) Evolution of a protein-folding machine: genomic and evolutionary analyses reveal three lineages of the archaeal hsp70(dnaK) gene. J Mol Evol 63:74-86
Brocchieri, Luciano; Karlin, Samuel (2005) Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res 33:3390-400
Brocchieri, Luciano; Kledal, Thomas N; Karlin, Samuel et al. (2005) Predicting coding potential from genome sequence: application to betaherpesviruses infecting rats and mice. J Virol 79:7570-96
Karlin, Samuel; Mrazek, Jan; Ma, Jiong et al. (2005) Predicted highly expressed genes in archaeal genomes. Proc Natl Acad Sci U S A 102:7303-8
Karlin, Samuel; Brocchieri, Luciano; Campbell, Allan et al. (2005) Genomic and proteomic comparisons between bacterial and archaeal genomes and related comparisons with the yeast and fly genomes. Proc Natl Acad Sci U S A 102:7309-14
Karlin, Samuel; Theriot, Julie; Mrazek, Jan (2004) Comparative analysis of gene expression among low G+C gram-positive genomes. Proc Natl Acad Sci U S A 101:6182-7
Karlin, Samuel; Barnett, Melanie J; Campbell, Allan M et al. (2003) Predicting gene expression levels from codon biases in alpha-proteobacterial genomes. Proc Natl Acad Sci U S A 100:7313-8
Campbell, Allan (2003) Prophage insertion sites. Res Microbiol 154:277-82
Mrazek, Jan; Gaynon, Lisa H; Karlin, Samuel (2002) Frequent oligonucleotide motifs in genomes of three streptococci. Nucleic Acids Res 30:4216-21

Showing the most recent 10 out of 141 publications