Finding the individual cis-regulatory elements for a gene is an important initial step for understanding its regulation and function. We would like to take the analysis one step further. We hypothesize that genes that have similar motif patterns in the promoter region, where the same pattern of motifs occurs in both human and mouse, might be functionally related or involved in the same biological process. This hypothesis can be extended to exploit the notion that the expression of a gene may be governed by a set of regulatory motifs in a combinatorial fashion. We accordingly use paired promoter sequences (human/mouse ortholog) to aid in identifying the ?true? cis-regulatory motifs. This approach has the potential to identify new genes involved in a known pathway or biological process. Towards this goal, first, we created a putative human-mouse gene ortholog promoter sequence database. Second, in order to effectively mine this data set we have developed a sequence alignment algorithm for identifying conserved segments in the paired promoter regions for human and mouse ortholog genes. We assume that transcription factor binding sites are more likely to be present in conserved (i.e., sequence-similar) regions than in non-conserved regions. Third, we have implemented a computational algorithm that can examine the promoter sequences in the data set and scan to identify binding sites for known transcription factors. Finally, we are developing algorithms based on a mathematical approach called the Gibbs sampler to identify common motifs (both known and unknown) that are present in a set of human and mouse promoter sequences. In the next section, I will explain each of the four steps we are taking and illustrate the idea of new gene discovery using a learning set of 17 base excision repair (BER) genes as an example. Besides this new initiative, we are also developing methods for analysis of microarray data and proteomics data. Notably, we have proposed a method called the genetic algorithm/k-nearest-neighbor (GA/KNN) approach. It is a multivariate stochastic search algorithm which selects a subset of genes that can discriminate between different classes of samples, e.g., normal versus tumor tissue, or unexposed versus exposed tissue. This tool has proved able to identify differentially-expressed genes, and, when used in conjunction with clustering methods, to reveal the existence of subcategories that share characteristic patterns of response (e.g., revealing important tumor subtypes) that may be etiologically distinct.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Intramural Research (Z01)
Project #
1Z01ES101765-02
Application #
7195549
Study Section
(BB)
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2005
Total Cost
Indirect Cost
Name
U.S. National Inst of Environ Hlth Scis
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Fan, Zheng; Ahn, Mihye; Roth, Heidi L et al. (2017) Sleep Apnea and Hypoventilation in Patients with Down Syndrome: Analysis of 144 Polysomnogram Studies. Children (Basel) 4:
Xu, Zongli; Niu, Liang; Li, Leping et al. (2016) ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Res 44:e20
Li, Yuanyuan; Krahn, Juno M; Flake, Gordon P et al. (2015) Toward predicting metastatic progression of melanoma based on gene expression data. Pigment Cell Melanoma Res 28:453-63
Zhang, Xiaoli; Li, Bing; Li, Wenguo et al. (2014) Transcriptional repression by the BRG1-SWI/SNF complex affects the pluripotency of human embryonic stem cells. Stem Cell Reports 3:460-74
Niu, Liang; Huang, Weichun; Umbach, David M et al. (2014) IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data. BMC Genomics 15:862
Huang, Weichun; Loganantharaj, Rasiah; Schroeder, Bryce et al. (2013) PAVIS: a tool for Peak Annotation and Visualization. Bioinformatics 29:3097-9
Huang, Weichun; Li, Leping; Myers, Jason R et al. (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28:593-4
Abell, Amy N; Jordan, Nicole Vincent; Huang, Weichun et al. (2011) MAP3K4/CBP-regulated H2B acetylation controls epithelial-mesenchymal transition in trophoblast stem cells. Cell Stem Cell 8:525-37
Xu, Mengyuan; Weinberg, Clarice R; Umbach, David M et al. (2011) coMOTIF: a mixture framework for identifying transcription factor and a coregulator motif in ChIP-seq data. Bioinformatics 27:2625-32
Mercier, Eloi; Droit, Arnaud; Li, Leping et al. (2011) An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. PLoS One 6:e16432

Showing the most recent 10 out of 29 publications