The biomining data and discovery section has been actively involved in a variety of NIAMS research project and in particular:? A project aimed at distinguish among several auto-inflammatory diseases with various degrees of systemic inflammation, including CAPS (Cryopyrin-associated periodic syndromes), HIDS (hyper-immunoglobulin D (IgD) syndrome), and TRAPS (Tumor necrosis factor receptor associated periodic syndrome), and to identify disease associated genes. This should be achieved by developing a systems biology based approach that identifies transcription factors potentially involved in signal transduction of Pyrin, the key gene that causes Familial Mediterranean Fever (FMF). Furthermore, a possible way to dissect inflammatory and chemotactic pathways in FMF and to identify gene-expression signatures in peripheral blood leukocytes of patients with TRAPS using microarray profiling is investigated.? A project aimed at the identification of signature genes that dictate the fate of adult stem cells to maintain as progenitors used in the therapeutic treatment of diseases and as cell precursors for tissue engineering applications rather than transform to a tumorigenic phenotype; ? A project which studis the effect of acetylation of MyoD on temporal patterns of skeletal muscle gene expression, MyoD being a transcription factor involved in a variety of muscle disorders; ? A study aimed at performing a genome-wide mapping of histone H3 K4 and K27 trimethylation that will help identity overall patterns of epigenetic modifications involved in lineage fate determination of differentiating CD4+ T cells critical in immune response ? ? ? As an integral part of a research project, BMDs involvement can be generally described as follows: ? design and implement statistical data analysis strategies? perform data analysis? discuss data analysis strategy and analysis results with other project team members? train researchers such as post-docs for special bioinformatics approaches? help with interpretation of analysis results obtained by other project team members? design, develop, and implement special computer programs required by a project? evaluate and implement special computer software in the public domain as needed by a project? design and create special figures for research publication? ? Our IRP made a strategic investmen with the Illuminas NGS (next generation sequencing) Genome Analyzer in October, 2007, which requires dedicated bioinformatics support in analyzing the gigantic amount of data this cutting-edge instrument produces (one experiment run produces about one Terabyte of data and analysis results). As a result, BMD transitioned itself accordingly, focusing now on providing bioinformatics support to Genome Analyzer data based research projects. ? ? During the past year, BMD has provided documented support to 12 research projects from 5 different Laboratories/Branches, developed two major data mining approaches, and designed and implemented 21 computer scripts/programs, as summarized below.? Gene expression studies of systemic inflammation in patients with Cryopyrin-associated periodic syndromes? CD11c+ cells in Pyrin-null mice? RNA interference of MEFV in THP.1 cells reveals a role for Endogenous pyrin in Toll-like receptor signaling (TLR) that is mediated by the transcription factor IRF2? ? Global mapping of histone H3 K4 and K27 trimethylation reveals specificity and plasticity in lineage fate determination of differentiating CD4+ T cells? Epigenetics of immunoglobulin class recombination and Igh/cMyc chromosomal translocations? A systems approach using genome-wide microRNA profiling for studying B cell development, activation, and differentiation? Global mapping of histone H3 K4 and K27 trimethylation of artificial stem cells? ? Global gene expression profile of a spontaneously transformed murine mesenchymal stem cell popylation? ? Myod acetylation influences temporal patterns of skeletal muscle gene expression? Genome-wide nucleosome profiles of undifferentiated and mature muscle cells? ? Identification and analysis of gene-expression signatures in peripheral blood leukocytes of patients with TRAPS? Dissecting inflammaroty and chemotactic pathways in FMF? ? Development of specific data analysis approaches? ? Creation and application of multiple predictive LDA models using microarray data from patients with inflammatory diseases? ? Linear discriminant analysis (LDA), a statistical and machine learning method used to identify the linear combination of variables that best separates two or more classes, was used to build multiple predictive classifiers. Classifier genes were selected with forward selection, a data-driven model building approach in which orthogonal variables are sequentially added to a model until certain criteria are satisfied. The total number of genes allowed in a classifier was limited to the number of samples in the smallest group to prevent over-fitting. Genes already selected to build a model were removed from the gene pool and the process was repeated to create multiple predictive classifiers of satisfactory quality. Each model is validated with a real leave-one-out cross validation approach, in which a model was completely rebuilt without the removed sample. This whole model building process allowed identification of multiple predictive gene sets. The predictive models are further validated by applying them to an independent data set using a simple voting scheme.? ? This approach has been successfully applied to several translational research projects involving patients of NOMID (Neonatal-Onset Multisystem Inflammatory Disease), HIDS (hyper-immunoglobulin D (IgD) syndrome), and TRAPS (Tumor necrosis factor receptor associated periodic syndrome). For NOMID project, the specificity and sensitivity of the predictive models are close to 90%.? ? A systems biology based approach to identify transcription factors potentially involved in specific signal transduction? ? Genes co-regulated by a transcription factor are expected to carry the corresponding TF binding site in their promoters (shared promoter). This approach starts with a set of user defined genes from microarray or other experiments. Promoter analysis is then performed to identify statistically enriched and commonly shared transcription factor binding sites. Due to the relaxed nature of all TF family binding sites, the existence of a shared TF family binding site, even statistically significant, does not necessarily mean it is functional. It is thus critically important that other lines of evidence be obtained to support and confirm a functional TF binding site. One such piece of evidence, although indirect, may come from the observation that specific TFs corresponding to the shared binding sites are themselves significantly regulated. Results of expression data analysis together with those of promoter analysis thus will point to potential transcription factors that may be involved in regulating the functions of the genes under investigation.? This approach has been successfully applied to the project of RNA interference of MEFV in THP.1 cells, where interferon regulatory factor 2 (IRF2) has been identified as a potential mediator of Pyrin.? ? Research tool development? ? Illumina only provides a general-purpose pipeline software for image analysis, base calling, and sequence alignment of the data produced by its Genome Analyzer. And there is no commercial software available for analysis beyond sequence because the Genome Analyzer is so new to the market. customized home-made computer programs developed to satisfy very specific research needs. BMD met the challenge with the development and implementation of some python programs.