Gene expression measurement using microarrays or next-generation sequencing techniques, is a popular and useful technology for genomic analysis. Challenging problems result from the large volume of data generated in these experiments. Quality control and experimental design remain important fundamental issues. Analytical techniques which account for complex experimental designs and minimizing artifacts are required. Bioinformaticians are required to be able to handle large scale data projects while also being to process data into a format where statistical procedures can be applied. There are different statistical and bioinformatics issues that remain and this project attempts to address some of these. Next generation sequencing techniques are now a popular means for RNA expression measurement (RNAseq). As with microarrays, a host of technical and quality control issues remain as challenges, in addition to the new statistical problems implied by change of scale from continuous (microarray fluorescence) to discrete (read counts). Affordable, high-quality software availability has been one of the bottlenecks in analysis of microarray data. We have further developed the """"""""MSCL Analyst's Toolbox"""""""" written in the JMP software package to address this need. This toolbox allows investigators to download Affymetrix microarray data from a central database, normalize and transform the data, inspect it for a variety of outliers or defects, perform a variety of statistical tests to select relevant genes affected in the experiment, and then visualize and classify various patterns of gene expression. In collaboration with over forty investigators in NCI, CC, NHLBI, NINDS, NIAID, NHGRI, NICHD, NIA, NIDDK, NIDA , this tool has been applied to dozens of microarray studies. The Analyst's Toolbox has been extended to now handle analysis of RNAseq data, with inclusion of new data transformations, and utility functions. In addition, the capability to link data from the user's workstation to online databases has been a nice feature that has been recently added to the Toolbox. In a collaboration with NHGRI, we are conducting an RNA-seq investigation of transcriptomic differences using a case-control design, of coronary artery calcification, based on ClinSeq study samples. We integrated RNA-seq and microarray data from the same individuals, and found consistent changes across the two methodologies, which are now candidates for follow-up studies. That same experiment has been extended to a possible novel transcript finding in coronary artery calcification patients. This finding is being further researched within our lab and the NHGRI. In a collaboration with NEI, we are analyzing the transcriptome of mouse photoreceptor from embryonic, through neonatal to later adult stages. This extensive time series, using both the Affymetrix Exon array and RNA-seq in parallel, allows for high resolution analysis at the gene and exon levels, and is providing an unparalleled view of transcriptomic changes accompanying important developmental events (e.g. differentiation, eye opening, aging).
The aim i s to identify genes involved in mammalian aging and which may be relevant to age-related diseases of the eye in human. In a collaboration with NIDCR, we are analyzing microbiome data from Leukocyte Adhesion Deficiency (LAD) patients using the Human Oral Microbe Identification Microarray (HOMIM). This study involves one patient with severe LAD, four patients with moderate LAD and 8 healthy controls. Each patient has samples taken from different sites within the mouth. LAD disease leads to severe periodontal disease along with defective neutrophil adhesion and transmigration into tissues. The goal of this study is to identify a core microbiome regardless of disease state in human samples along with finding microbiota that might be important in the development of sever periodontal disease in LAD patients. The manuscript for this data is in preparation and is about to be submitted for publication. In a collaboration with the Clinical Center, we are analyzing metagenomic data from 16S rRNA sequencing of severe aplastic anemia (SAA) patients. This is a longitudinal study where samples are collected at baseline, three months after treatment and 6 months after treatment. Pilot data from two patients at baseline and follow up have been studied so far. The study will eventually have an enrollment N of 40 patients and 15 aged and gender matched controls. The goal of this study is to identify a core microbiome in humans regardless of disease state while also being able to identify changes in the microbiome in SAA patients before treatment and after treatment.