The Biodata Mining and Discovery section has been actively involved in a variety of NIAMS research projects and in particular: - A study that shows generation of pathogenic Th17 cells in the absence of TGF-beta signaling - A deep sequencing analysis that identifies the genomic targets of the cytidine deaminase AID and its cofactor RPA in B lymphocytes - A systems biology analysis of PFAPA syndrome - A copy number variation study that identifies LEPREL1 (P3H2) intron 1 deletion associated with protection from multiple inflammatory diseases - Homeostatic tissue responses in skin biopsies from NOMID patients with constitutive overproduction of IL-1-beta - Opposing regulation of the locus encoding IL-17 through direct, reciprocal actions of STAT3 and STAT5 - IL-27 priming of T cells controls IL-17 production in trans via induction of PD-L1 - Neural crest deletion of Dlx3 recapitulates features of Tricho-Dento-Osseous syndrome - Combining microarray and ChIP-Seq data to screen for key transcription factors associated with folliculin interacting protein 1 in B lymphocytes - Identifying molecular targets of heterotopic ossification following war trauma using RNA-Seq and microRNA-Seq - Applying RNA-Seq to SLE: identifying distinct gene expression profiles associated with high levels of auto-reactive IgE antibodies in systemic lupus erythematosus Major computational approaches and methods developed are highlighted below. The development of a Peak Assignment and Profile Search Tool (PAPST) Based on our extensive experience in analyzing ChIP-Seq data, PAPST has been developed to combine several most useful data analysis methods developed previously with a unique feature of its own as an easy-to-use novel and fast profile search tool of ChIP-Seq data for genes with specific transcription factor binding and epigenetic modifications. Systematically analyzing post-peak-calling ChIP-Seq data is a great challenge not only because of a current lacking of the software tools, but equally important also because the limited existing tools are largely inaccessible to the lab scientists who are ultimately responsible for making sense of the peak-calling results. PAPST has been developed for post-peak-calling ChIP-Seq data analysis in response to this great challenge. With a few mouse clicks and within seconds, PAPST allows a user to quickly identify genes with specific transcription factor (TF) binding and/or epigenetic modification co-localization profiles, a novel and unique feature of the software tool that answers questions such as what are the genes with TF1 and TF2 binding and epigenetic mark A in their promoters, and epigenetic marks B and C in their gene bodies?. Other quick PAPST analysis results include peak distribution statistics among gene-centered genomic regions and the number of overlapping peaks for all pair-wise sample comparisons. PAPST can also generate microarray style gene-centered quantitative ChIP-Seq data with a single mouse click, which may then be combined with RNA-Seq or microarray data, if available, to facilitate further down-stream analysis. A Java based platform independent desktop application, PAPST is very user friendly and requires no special computational expertise to use. For advanced users, PAPST may also be creatively used as a general genomic interval based search tool to fast screen any coordinated genomic feature, such as genes or a set of TF binding peaks, against any other coordinated genomic features in any combination. A method that combines microarray data and ChIP-Seq data to screen for key transcription factors This is a computational strategy in which select ChIP-Seq data from GEO (Gene Expression Omnibus), after peak calling and peak assignment to genes, are combined with in-house generated microarray data to screen for potentially important transcription factors. The approach identifies the number of genes with a TF binding among all the expressed genes;it then does the same analysis with differentially expressed genes. A Fisher exact test is applied to the results to determine if the difference between the two sets of results (the number of TF occupied genes in all expressed genes vs the number of TF occupied genes in differentially expressed genes, given the two totals) is statistically significant. The TFs with a significantly higher percentage of TF occupied genes in differentially expressed genes as compared to that in all expressed genes would be the potential key ones for further down-stream analysis. A strategy to identify potential transcription factors that may regulate microRNA targeted genes This strategy involves in the following general steps on analyzing RNA-Seq and microRNA-Seq data: a) identify differentially expression genes;b) identify differentially expressed microRNAs;c) identify computationally predicted mRNA targets for the differentially expressed microRNAs with multiple methods such as TargetScan, PicTar, mirRanda, and mirSvr;d) identify a reliable set of predicted microRNA targets;e) identify the overlap between the differentially expressed genes and the predicted microRNA targets these are the genes for the next step;f) motif enrichment analysis with the promoter sequences of the genes identified in e, using tools such as MEME and DREME. Transcription factors with their binding sites enriched in such analysis would be the potential regulators for microRNA targeted genes and they may be subjected to ChIP-Seq in follow-up studies.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Institute of Arthritis and Musculoskeletal and Skin Diseases
Zip Code
Palazzo, Elisabetta; Kellett, Meghan D; Cataisson, Christophe et al. (2017) A novel DLX3-PKC integrated signaling network drives keratinocyte differentiation. Cell Death Differ 24:717-730
Furumoto, Yasuko; Smith, Carolyne K; Blanco, Luz et al. (2017) Tofacitinib Ameliorates Murine Lupus and Its Associated Vascular Dysfunction. Arthritis Rheumatol 69:148-160
Benhalevy, Daniel; Gupta, Sanjay K; Danan, Charles H et al. (2017) The Human CCHC-type Zinc Finger Nucleic Acid-Binding Protein Binds G-Rich Elements in Target mRNA Coding Sequences and Promotes Translation. Cell Rep 18:2979-2990
Afzali, Behdad; Grönholm, Juha; Vandrovcova, Jana et al. (2017) BACH2 immunodeficiency illustrates an association between super-enhancers and haploinsufficiency. Nat Immunol 18:813-823
Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I et al. (2017) The effects of shared information on semantic calculations in the gene ontology. Comput Struct Biotechnol J 15:195-211
Layh-Schmitt, Gerlinde; Lu, Shajia; Navid, Fatemeh et al. (2017) Generation and differentiation of induced pluripotent stem cells reveal ankylosing spondylitis risk gene expression in bone progenitors. Clin Rheumatol 36:143-154
Muñoz-Cano, Rosa; Pascal, Mariona; Bartra, Joan et al. (2016) Distinct transcriptome profiles differentiate nonsteroidal anti-inflammatory drug-dependent from nonsteroidal anti-inflammatory drug-independent food-induced anaphylaxis. J Allergy Clin Immunol 137:137-146
Shih, Han-Yu; Sciumè, Giuseppe; Mikami, Yohei et al. (2016) Developmental Acquisition of Regulomes Underlies Innate Lymphoid Cell Functionality. Cell 165:1120-1133
Villarino, Alejandro; Laurence, Arian; Robinson, Gertraud W et al. (2016) Signal transducer and activator of transcription 5 (STAT5) paralog dose governs T cell effector and regulatory functions. Elife 5:
Hirahara, Kiyoshi; Onodera, Atsushi; Villarino, Alejandro V et al. (2015) Asymmetric Action of STAT Transcription Factors Drives Transcriptional Outputs and Cytokine Specificity. Immunity 42:877-89

Showing the most recent 10 out of 43 publications