The Biodata Mining and Discovery Section has been actively involved in a variety of NIAMS research projects, the following in particular: - Identification of causal mutations in families affected with immunodysregulatory diseases - Investigation of genes involved in Inclusion Body Myositis using whole exome sequencing - Mutation screening in patients with NEMO-like syndrome - Studies on expression signatures of autoinflammatory diseases including NOMID, CANDLE, PAPA, Panniculitis and STING - Applying RNA-Seq to SLE: identifying distinct gene expression profiles associated with high levels of auto-reactive IgE antibodies in systemic lupus erythematosus - Gene expression profiling in patients with cryopyrin-associated periodic syndromes - Targeted re-sequencing of the familial Mediterranean fever gene MEFV - Studies on early replicating fragile sites that contribute to genome instability - Thymocyte development and emigration proteins - Transcription factors that shape the active enhancer landscape of T cell populations - Effect of cutaneous retinoic acid levels on hair follicle development and down-growth - Homeostatic tissue responses in skin biopsies from NOMID patients with constitutive overproduction of IL-1β - Study on roles of vitamin-D receptor as a signaling regulator in development of the tooth root and differentiation of associated cell types using RNA-Seq - Gene expression profiling on tissue inhibitor metalloproteinase 1 in Th1 and Th17 cells Major computational approaches and methods developed are highlighted below. Development of computational pipeline for whole exome sequencing (WES) data processing and quality assessment The pipeline has been developed to process sequencing reads generated by WES. It combines publicly available computational tools such as FastQC, BWA, PICARD, GATK, ANNOVAR, SNPEFF, KING with home-brew scripts in PERL, SHELL and R. It generates QC metrics that can be used to estimate the false-positive and false-negative rates of WES experiments. In addition, the pipeline can detect a potential sample mix-up and discrepancy in gender, ethnicity or family relationship. The final output is a list of fully annotated variants discovered in each sample. The validity and robustness of the pipeline have been tested in more than 100 WES samples processed so far. Design of mutational analysis workflow for WES experiments-A typical WES experiment usually generates about 20,000 coding variants in a sample. To identify pathogenic mutations likely to be responsible for the disease, it is necessary to develop a method that can filter out variants based on disease prevalence in the population, functional impacts of the variants and the possible inheritance modes. Such a method has been developed and applied successfully in a number of families affected with immunodysregulatory diseases. A few examples are denovo mutations found in genes such as LYN, STING and DHX9. Method to detect rare somatic mutations in WES data-A number of immunodysregulatory diseases are known to be somatic: only a subset of cells in a sample harbors the mutations. This poses a challenge to uncover such mutations in WES experiments, as the standard method assumes the homogeneity of the cell populations in a sample. A method has been developed that relies on raw sequencing read counts as an indication of potential somatic mutations. Subsequent analysis can then be applied to prioritize the candidate mutations. This approach has successfully identified a somatic mutation in the NLRP3 gene from a NOMID family trio. A computational approach to study epigenetic landscape and its regulation-This complex computational approach has been designed and developed to study epigenetic landscape in the context of multiple biological conditions and in relation to the relevant gene expression data. It involves tag density scan and summarization in multiple genomic intervals around TSS (transcription start site) and TTS (transcription termination site) of all individual genes on multiple epigenetic marks under multiple biological conditions. The tag scan density data matrices are then subject to k-means clustering to group genes based on distinctive epigenetic profiles. The clustering results are combined with the relevant gene expression data to make an epi-landscape-gene-expression profile heatmap. This approach has been applied to study epigenetic landscape, epigenetic cluster switching in particular, for all the genes as well as for differentially expressed genes under specific biological conditions. Further development and automation of GRO-Seq data analysis methods Methods have been further developed for GRO-Seq (nuclear run-on assay followed by sequencing). Developed and tested procedures have been largely automated with Bash and Python to efficiently carry out specific tasks including removing ribosomal RNA sequences, aligning ribo-free sequences to a genome, generating strand specific Genome Browser viewable files, and identifying statistically significant strand specific peaks marking transcripts that are being actively transcribed. In addition, a customized computational solution has been designed and developed to calculate the strand specific tag density around tss, gene body, tes, and to dynamically generate individual-gene based graphical profiles of strand specific tag density for a given number of genes of specific biological interest. Development of an Oracle prototype database for storing NGS sample data-This NGS sample-focused Oracle database is being developed in response to the exponential growth of the number of NGS samples in recent years - over 5000 in the last three years alone at the NIAMS IRP. The database stores fundamental sequencing data including run ID, lane, sample ID, process ID, reference genome, index, project ID, researcher name, PI name, and number of QC passing reads. The database is currently being tested and will soon be deployed for application. The prototype database will serve as a core that can be expanded to include more data types. Pathway Analysis-Methods have been developed for applying Gene Set Enrichment Analysis (GSEA) to determine which pathways are significantly enriched in RNA-Seq expression data. GSEA was developed by the Broad Institute to analyze microarray expression data. Methods were developed to prepare RNA-Seq data for analysis in GSEA, using the GSEA options that best fit the analysis of RNA-Seq data. A set of around 1000 pathways or gene sets from the Molecular Signatures Database have been collected and refined. RNA-Seq methodology. Comparison was made of the results of competing methodologies for analyzing RNA-Seq data, including Cuffdiff multiple samples per group, Cuffdiff single sample per group and Partek. It was determined that while Cuffdiff single sample per group results include values of statistical difference (p-values and q-values), these statistics are unreliable. Sensitivity of RNA-Seq expression calls was also compared with the gold standard qPCR, determining that RPKM values of around 0.1 or read counts of around 5 per transcript are at the lower level of detection. RNA-Seq expression values above these cutoffs correspond well with qPCR results. Excel Macros-Excel Macros were developed using Microsoft Visual Basic for Applications to perform both standard and customized time-consuming manipulations of microarray data and other common tasks for use within BMDS and by laboratories within NIAMS. Mass Spectroscopy Protein Expression Analysis-Methodology was developed to analyze protein expression based on raw data from Mass Spectroscopy experiments. Differential expression was examined based on interferon treatment, proteasome inhibitor treatment or disease state.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Institute of Arthritis and Musculoskeletal and Skin Diseases
Zip Code
Kim, Hanna; de Jesus, Adriana A; Brooks, Stephen R et al. (2018) Development of a Validated Interferon Score Using NanoString Technology. J Interferon Cytokine Res 38:171-185
Iglesias-Bartolome, Ramiro; Uchiyama, Akihiko; Molinolo, Alfredo A et al. (2018) Transcriptional signature primes human oral mucosa for rapid wound healing. Sci Transl Med 10:
Kim, Hanna; Brooks, Kristina M; Tang, Cheng Cai et al. (2018) Pharmacokinetics, Pharmacodynamics, and Proposed Dosing of the Oral JAK1 and JAK2 Inhibitor Baricitinib in Pediatric and Young Adult CANDLE and SAVI Patients. Clin Pharmacol Ther 104:364-373
Carlucci, Philip M; Purmalek, Monica M; Dey, Amit K et al. (2018) Neutrophil subsets and their gene signature associate with vascular inflammation and coronary atherosclerosis in lupus. JCI Insight 3:
Giannelou, Angeliki; Wang, Hongying; Zhou, Qing et al. (2018) Aberrant tRNA processing causes an autoinflammatory syndrome responsive to TNF inhibitors. Ann Rheum Dis 77:612-619
Sanchez, Gina A Montealegre; Reinhardt, Adam; Ramsey, Suzanne et al. (2018) JAK1/2 inhibition with baricitinib in the treatment of autoinflammatory interferonopathies. J Clin Invest 128:3041-3052
Bhattacharya, Shreya; Kim, Jin-Chul; Ogawa, Youichi et al. (2018) DLX3-Dependent STAT3 Signaling in Keratinocytes Regulates Skin Immune Homeostasis. J Invest Dermatol 138:1052-1061
Sikora, Keith A; Bennett, Joshua R; Vyncke, Laurens et al. (2018) Germline gain-of-function myeloid differentiation primary response gene-88 (MYD88) mutation in a child with severe arthritis. J Allergy Clin Immunol 141:1943-1947.e9
Tsai, Pei-Fang; Dell'Orso, Stefania; Rodriguez, Joseph et al. (2018) A Muscle-Specific Enhancer RNA Mediates Cohesin Recruitment and Regulates Transcription In trans. Mol Cell 71:129-141.e8
Kang, Heeseog; Jha, Smita; Deng, Zuoming et al. (2018) Somatic activating mutations in MAP2K1 cause melorheostosis. Nat Commun 9:1390

Showing the most recent 10 out of 55 publications