Molecular Sequence Data

Karlin, Samuel

Abstract

The accumulation of molecular sequence data is proceeding at an unprecedented pace. The next phase of molecular biology will be increasingly dominated by efforts to characterize, categorize, and analyze these data with the goal of understanding molecular sequence information and its significance in biological systems. The investigators' proposal is aimed at achieving a deeper understanding of genome structure, function, and evolution using empirical, descriptive and interactive statistical and computational methods. They focus primarily on three interrelated areas: I. Analysis of codon usage patterns. Detailed knowledge of codon and residue choices can help in gene prediction, in characterizing properties of a given gene, and in defining gene classes. They propose a broad analysis of codon usage biases for individual genes and gene classes in complete prokaryotic and eukaryotic genomes. In particular, the investigators' studies will concern codon preferences in different gene classes, including (i) gene classes characterized by function and/or cellular localization; (ii) classes determined by gene size; (iii) codons of a gene divided into three parts: the amino 1/3 part, the middle 1/3 part, and the carboxyl 1/3 part; (iv) genes encoded from the leading vs. lagging strand; and (v) classes of horizontally transferred genes characterized with the aid of codon bias extremes. II. Studies of anomalous genes, including alien genes, highly expressed genes, and those in pathogenicity islands. In complete genomes or in extended contigs of great biological and medical interest are characterizations of alien genes (e.g., laterally transferred), or of alien gene clusters (e.g., pathogenicity or specialization islands), or of highly expressed genes. III. Statistical methods for genome sequence analysis. These will include: (a) characterizations of genomic heterogeneity within and between organisms (e.g., in terms of rare and frequent nucleotides, of motifs, or of compositional biases); (b) extensions of r-scan statistics, which assess anomalies in the distribution of markers along sequences; and (c) statistics of recurrent sequences among genomes characterized by numbers of repeat families, by their sizes (bp or aa.), by spacings between repeats, and by properties of repeat families (intergenic, coding, direct, inverted, mixed).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 2R01HG000335-12
Application #: 2901693
Study Section: Genome Study Section (GNM)
Program Officer: Brooks, Lisa

Project Start: 1988-08-01
Project End: 2002-07-31
Budget Start: 1999-08-01
Budget End: 2000-07-31
Support Year: 12
Fiscal Year: 1999
Total Cost
Indirect Cost

Institution

Name: Stanford University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 800771545

City: Stanford
State: CA
Country: United States
Zip Code: 94305

Related projects

Publications

Karlin, Samuel; Theriot, Julie; Mrazek, Jan (2004) Comparative analysis of gene expression among low G+C gram-positive genomes. Proc Natl Acad Sci U S A 101:6182-7

Karlin, Samuel; Barnett, Melanie J; Campbell, Allan M et al. (2003) Predicting gene expression levels from codon biases in alpha-proteobacterial genomes. Proc Natl Acad Sci U S A 100:7313-8

Mrazek, Jan; Gaynon, Lisa H; Karlin, Samuel (2002) Frequent oligonucleotide motifs in genomes of three streptococci. Nucleic Acids Res 30:4216-21

Karlin, Samuel; Chen, Chingfer; Gentles, Andrew J et al. (2002) Associations between human disease genes and overlapping gene groups and multiple amino acid runs. Proc Natl Acad Sci U S A 99:17008-13

Ma, Jiong; Campbell, Allan; Karlin, Samuel (2002) Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures. J Bacteriol 184:5733-45

Karlin, Samuel; Brocchieri, Luciano; Bergman, Aviv et al. (2002) Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci U S A 99:333-8

Chen, Chingfer; Gentles, Andrew J; Jurka, Jerzy et al. (2002) Genes, pseudogenes, and Alu sequence organization across human chromosomes 21 and 22. Proc Natl Acad Sci U S A 99:2930-5

Karlin, Samuel; Brocchieri, Luciano; Trent, Jonathan et al. (2002) Heterogeneity of genome and proteome content in bacteria, archaea, and eukaryotes. Theor Popul Biol 61:367-90

Karlin, S (2001) Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol 9:335-43

Brocchieri, L (2001) Phylogenetic inferences from molecular sequences: review and critique. Theor Popul Biol 59:27-40

Showing the most recent 10 out of 74 publications

Comments

Be the first to comment on Samuel Karlin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: