Large scale gene expression profiling studies provide valuable information about the expression changes of individual genes in response to exposure to environmental toxicants/stressors. However, investigators often face the challenges of making sense of expression changes in a global prospective, as the tools for integrating individual genes into functional pathways and networks remain undeveloped. Statistical/data mining approaches are urgently needed to make optimal use of these high-dimensional data. This need becomes greater as the size and complexity of genomics data increase and the biological questions to be addressed become more sophisticated. We have developed a method, called the GA/KNN that selects a subset of genes that can discriminate between different classes of samples. Presently, we are developing methods that combine gene expression data and genomic sequence data to identify genes that may be functionally related and to try to understand gene regulation. Towards this goal, we have created a human-mouse gene ortholog promoter sequences database. We are testing different sequence alignment algorithms that can be used to identify regions that are conserved across species. In addition, we have implemented a computational algorithm that can scan to identify binding sites within the promoter sequences in the database for known transcription factors. We are also developing algorithms based on the Gibbs sampling technique to identify common motifs (both known and unknown) in the promoter regions of genes that have similar patterns of expression in a dose-response or time-course experiment. We have also developed methods for classifying effects on expression over time or dose, based on order-restricted statistical inference. In another project, we developed a model for the expression data from genes that cycle with the cell cycle, as measured in synchronization protocols. Our model captures the attenuation as the cells fall out of synchrony across time, by postulating variability in the duration of the cell cycle across cells.
Peddada, Shyamal D; Lobenhofer, Edward K; Li, Leping et al. (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19:834-41 |
Li, L; Darden, T A; Weinberg, C R et al. (2001) Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High Throughput Screen 4:727-39 |
Li, L; Weinberg, C R; Darden, T A et al. (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17:1131-42 |