Large-scale gene expression profiling studies provide valuable information about the expression changes of individual genes in response to exposure to environmental toxicants/stressors. However, investigators often face the challenges of making sense of the changes in a broader prospective, as the tools for integrating individual genes into functional pathways and networks remain rudimentary. Statistical/data mining approaches are urgently needed to make optimal use of these high-dimensional data. This need becomes greater as the size and complexity of genomics data grow and the biological questions to be addressed become more sophisticated. We have proposed a method called the genetic algorithm/k-nearest-neighbor (GA/KNN) approach. It is a multivariate stochastic search algorithm which selects a subset of genes that can discriminate between different classes of samples, e.g., normal versus tumor tissue, or unexposed versus exposed tissue. This tool has proved able to identify differentially-expressed genes, and, when used in conjunction with clustering methods, to reveal the existence of subcategories that share characteristic distinct patterns of response (e.g., tumor subtypes) We have also developed methods for classifying effects on expressionover time or dose, based on order-restricted statistical inference. In another project, we developed a non-linear regression model for quantitatively analyzing periodic gene expression in studies of experimentally synchronized cells. The model permits identification of genes whose expression varies with the cell cycle and permits hypothesis testing about biologically meaningful parameters that characterize cycling genes. Presently, we are developing methods that combine gene expression data and genomic sequence data to identify families of genes that may be functionally related, and to try to understand gene regulation. Towards this goal, we have created a human-mouse gene ortholog promoter sequence data set. We have developed a sequence alignment algorithm for identifying promoter regions that are conserved between the two species. In addition, we have implemented a computational algorithm that can look in the promoter sequences in the data set and scan to identify binding sites for known transcription factors. We are also developing algorithms based on a mathematical approach called the Gibbs sampler to identify common motifs (both known and unknown) that are present in a set of human and mouse promoter sequences.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Intramural Research (Z01)
Project #
1Z01ES101765-01
Application #
7007552
Study Section
(BB)
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
2004
Total Cost
Indirect Cost
Name
U.S. National Inst of Environ Hlth Scis
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Fan, Zheng; Ahn, Mihye; Roth, Heidi L et al. (2017) Sleep Apnea and Hypoventilation in Patients with Down Syndrome: Analysis of 144 Polysomnogram Studies. Children (Basel) 4:
Xu, Zongli; Niu, Liang; Li, Leping et al. (2016) ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Res 44:e20
Li, Yuanyuan; Krahn, Juno M; Flake, Gordon P et al. (2015) Toward predicting metastatic progression of melanoma based on gene expression data. Pigment Cell Melanoma Res 28:453-63
Zhang, Xiaoli; Li, Bing; Li, Wenguo et al. (2014) Transcriptional repression by the BRG1-SWI/SNF complex affects the pluripotency of human embryonic stem cells. Stem Cell Reports 3:460-74
Niu, Liang; Huang, Weichun; Umbach, David M et al. (2014) IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data. BMC Genomics 15:862
Huang, Weichun; Loganantharaj, Rasiah; Schroeder, Bryce et al. (2013) PAVIS: a tool for Peak Annotation and Visualization. Bioinformatics 29:3097-9
Huang, Weichun; Li, Leping; Myers, Jason R et al. (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28:593-4
Abell, Amy N; Jordan, Nicole Vincent; Huang, Weichun et al. (2011) MAP3K4/CBP-regulated H2B acetylation controls epithelial-mesenchymal transition in trophoblast stem cells. Cell Stem Cell 8:525-37
Xu, Mengyuan; Weinberg, Clarice R; Umbach, David M et al. (2011) coMOTIF: a mixture framework for identifying transcription factor and a coregulator motif in ChIP-seq data. Bioinformatics 27:2625-32
Mercier, Eloi; Droit, Arnaud; Li, Leping et al. (2011) An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. PLoS One 6:e16432

Showing the most recent 10 out of 29 publications