The expression patterns of many genes associated with circannual (yearly), circadian (daily), cell-cycle and other periodic biological processes are known to be rhythmic. Conversely, the expression profiles of genes associated with aperiodic biological processes (e.g., tissue repair) are not rhythmic. The functional significance of previously uncharacterized genes, therefore, may be inferred if they exhibit rhythmic patterns of expression synchronized to some ongoing biological process.

DNA microarray experiments are an effective tool for identifying rhythmic genes when a time-series of expression levels are collected. Unlike Northern blots and real-time PCR, which study one gene at a time, DNA microarray hybridization experiments can reveal the expression patterns of entire genomes. Chronobiologists are therefore able to assign putative functional properties to large numbers of genes based on the results of a single experiment. However, the large volume of data generated by hybridization experiments makes manual inspection of individual expression profiles impractical. Separating the subset of genes whose expression profiles are rhythmic from the thousands or tens of thousands that are not requires computer assistance. Ideally, the algorithms for providing such assistance should be efficient and have well-understood performance guarantees.

We propose to design and implement algorithms to identify and characterize the properties of rhythmic genes from DNA microarray hybridization time-series data. Our approach will build on our recent papers in {em The International Conference on Research in Computational Molecular Biology (RECOMB)}~cite{recomb02} and the {em IEEE Computer Society Bioinformatics Conference}~cite{csb02}. We will specifically addresses issues of computational complexity, statistical significance and morphological similarity. We hope our techniques will aid efforts in functional genomics, by developing new algorithmic techniques for the analysis of massively-parallel DNA microarray expression data.

We will develop a model-based analysis technique for extracting and characterizing rhythmic expression profiles from genome-wide DNA microarray hybridization data. Our approach, called {sc rage} (Rhythmic Analysis of Gene Expression), decouples the problems of estimating a pattern's wavelength and phase. Specifically (I) we propose the {em autocorrelation} to render our search algorithm phase-independent, and (II) we propose the {em Hausdorff distance} to measure the similarity of the autocorrelated signals. By attacking the problem of microarray gene-expression time-series analysis using these new methods, we hope to strengthen the computational armamentarium of the chronobiologist. Our {sc rage} algorithm is linear-time in frequency and phase resolution, an improvement over previous quadratic-time approaches. Unlike previous approaches, {sc rage} uses a true distance metric for measuring expression profile similarity, based on the Hausdorff distance. This results in better clustering of expression profiles for rhythmic analysis. The confidence of each frequency estimate is computed using $Z$-scores. In preliminary results,{sc rage}performed better than competing techniques on synthetic and actual DNA microarray hybridization data. Employing results on combinatorial bounds for Voronoi diagrams~cite{Huttenlocher}, we can replace the discretized phase search in our method with an exact (combinatorially precise) phase search~cite{recomb02}, resulting in a faster algorithm with no complexity dependence on phase resolution. Thus, one emphasis of this proposal is the development of combinatorially-precise, provable algorithms for analyzing expression patterns.

Surprisingly, maximum entropy spectral analysis (MESA) has not been applied to massively-parallel gene expression time-series analysis before. Therefore, we will also develop a maximum entropy-based analysis technique for extracting and characterizing rhythmic expression profiles from DNA microarray hybridization data. This approach, called {sc enrage} (Entropy-based Rhythmic Analysis of Gene Expression), treats the task of estimating an expression profile's periodicity and phase as a simultaneous bicriterion optimization problem. Specifically, a frequency domain spectrum is reconstructed from a time-series of gene expression data, subject to two constraints: (a) the likelihood of the spectrum and (b) the Shannon entropy of the reconstructed spectrum. Unlike Fourier-based spectral analysis, maximum entropy spectral reconstruction is well-suited to signals of the type generated in DNA microarray experiments. The {sc enrage} algorithm is optimal, running in linear time in the number of expression profiles. Moreover, a preliminary implementation of our algorithm runs an order of magnitude faster than previous methods. In preliminary results, we found that {sc enrage} performed better than previous methods in identifying and characterizing periodic expression profiles on both synthetic and actual DNA microarray hybridization data. Thus, a second thrust of this proposal is the development of novel signal-processing approaches to analyze gene expression patterns, and their integration with combinatorial algorithms from computational geometry.

Project Start
Project End
Budget Start
2003-05-15
Budget End
2006-04-30
Support Year
Fiscal Year
2003
Total Cost
$75,000
Indirect Cost
Name
Dartmouth College
Department
Type
DUNS #
City
Hanover
State
NH
Country
United States
Zip Code
03755