Pattern Recognition for Analysis of Molecular Sequences

Waterman, Michael

Abstract

We propose to continue (A) the development of mathematical, statistical and computer methods for the analysis of DNA, RNA and protein sequences and (B) the application of these methods. The comparison of two and more informational sequences is central to many problems in molecular biology. (1) Finding consenting patterns that define genetic control regions or that determine structure or function are important examples of sequence comparisons. An algorithm already developed by my group will be developed further and applied to several new data sets, such as Pol II promoters and RNA splice signals. Careful data analyses should suggest new modifications to the method. New and nontrivial insights into promoter patterns, for example, could result from an unbiased, rigorous analysis with calculated significance levels. (2) Secondary structure of 5S, 16S, and 23S rRNA has been inferred by the phylogenetic method. Consensus and probability results will be developed to solve this problem in a rigorous way. Again, new information about secondary structure could result. (3) T1 catalogs are available for 16S rRNA from many organisms. A careful analysis, based on pattern and significance of found patterns, will be made. This will constitute a new and entirely unbiased study of divisions such as archaebacteria, eukaryotes, and eubacteria. (4) Recent important results have been established for the exact (extreme value) distribution of long exact matches between random sequences. These distributions are fundamental to pattern recognition in general and allow statistical assessment of found patterns. The distributions will be extended to include results of long matching where mismatches and insertion/deletions are allowed.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM036230-09
Application #: 2178222
Study Section: Genome Study Section (GNM)

Project Start: 1986-01-01
Project End: 1994-12-31
Budget Start: 1994-01-01
Budget End: 1994-12-31
Support Year: 9
Fiscal Year: 1994
Total Cost
Indirect Cost

Institution

Name: University of Southern California
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 041544081

City: Los Angeles
State: CA
Country: United States
Zip Code: 90089

Related projects

Publications

Kruglyak, S; Tang, H (2001) A new estimator of significance of correlation in time series data. J Comput Biol 8:463-70

Kruglyak, S; Tang, H (2000) Regulation of adjacent yeast genes. Trends Genet 16:109-11

Heyer, L J; Kruglyak, S; Yooseph, S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9:1106-15

Lee, J K; Dancik, V; Waterman, M S (1998) Estimation for restriction sites observed by optical mapping using reversible-jump Markov Chain Monte Carlo. J Comput Biol 5:505-15

Dancik, V; Hannenhalli, S; Muthurkrishnan, S (1997) Hardness of flip-cut problems from optical mapping. J Comput Biol 4:119-25

Komatsoulis, G A; Waterman, M S (1997) A new computational method for detection of chimeric 16S rRNA artifacts generated by PCR amplification from mixed bacterial populations. Appl Environ Microbiol 63:2338-46

Agarwala, R; Batzoglou, S; Dancik, V et al. (1997) Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model. J Comput Biol 4:275-96

Arratia, R; Martin, D; Reinert, G et al. (1996) Poisson process approximation for sequence repeats, and sequencing by hybridization. J Comput Biol 3:425-63

Port, E; Sun, F; Martin, D et al. (1995) Genomic mapping by end-characterized random clones: a mathematical analysis. Genomics 26:84-100

Sun, F; Arnheim, N; Waterman, M S (1995) Whole genome amplification of single cells: mathematical analysis of PEP and tagged PCR. Nucleic Acids Res 23:3034-40

Showing the most recent 10 out of 23 publications

Comments

Be the first to comment on Michael Waterman's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: