Radiation Oncology Branch - Microarray Facility

Camphausen, Kevin

Abstract

Core services: As an integral component of the ROB research framework, the MCF has contributed to the scientific excellence of the ROB investigators. The Core has been utilized by all ROB investigators and a few external collaborators, leading to many joint publications. The MCF provides two main services: 1) Affy microarray service and 2) bioinformatics and biostatistics analysis. a. Microarray service. The core specializes in the use, and analysis of microarrays for large-scale gene expression profiling and genetic profiling, and provides full core lab services for Affymetrix GeneChip arrays. The process begins with a research hypothesis and the development of a sound experimental design in consultation with the primary researcher. Basic gene expression service begins by providing the core with total RNA. The core will first run the samples on gel electrophoresis or on an Agilent 2100 bioanalyzer to assess sample quality. If the samples pass QC, the core processes the samples through each step culminating with a comprehensive dataset. Users will be provided with all the array files associated with their runs. In addition, the files are uploaded to a shared microarray database (mAdb) system (http://madb.nci.nih.gov) developed by National Cancer Institute's (NCI) Center for Cancer Research (CCR) in collaboration with the Center for Information Technology. b. Data analysis service. Collaborative statistical support on multidisciplinary projects is provided to basic and clinical research investigators. Analysis of high throughput studies involves a wide range of statistical methods begins with consultation regarding experimental design (Table 1b). Microarray results are directly influenced by the experimental design; therefore it is suggested that the researcher control for as many experimental factors as possible. Arrays are initially evaluated for basic quality control. In the case of Affymetrix chips, a set of bacterial gene (BioC, bioD, and cre) spike-in controls are used to evaluate hybridization efficiency. A set of internal housekeeping gene (GAPDH and B-Actin) controls are used to assess the quality of the synthesis of labeled cRNA. The data are normalized by the Lowess (cDNA array), MAS5.0 scaling method, Robust Multichip Average (RMA), or other methods, depending on attributes of the experiment or the user's preference. The data are formatted to show probe-gene identities aligned with log2 signal intensity measurements and detection p-values for each gene of each array experiment. In the case of sequencing data, computational analysis of reference genome alignment, quality filtering, annotation to map the aberrations to cancer specific and population based sequences and comparative analysis are performed. Whenever possible, data are normalized, copy number is computed and segmented, and loss of heterozygosity is calculated. Data are also visualized to see gains and losses across chromosomes. The primary users are consulted after the initial analysis of the data, that may, for example, include a statistical analysis whereby Student's t-test or Wilcoxon rank-sum may be used to assess the significance of differential expression of genes between two groups, or more than two groups (e.g., ANOVA , Kruskal-Wallis) followed by false discovery correction (e.g. Benjamini and Hochberg method). After consultation with the primary user, advanced analytical objectives such as data visualization and mining, differential expression analyses, pathway analyses, and in silico independent validation of results (meta-analysis using information available in the literature and in expression databases) to draw meaningful conclusions from the omic technologies. Support for clinical studies involves methods to estimate risk ratios, odds ratios, survival analysis and cox regression analysis. The analytical methods for the analysis of high throughput data are continuously evolving and to keep abreast of the latest developments and best practices, the core works in collaboration with members of CCR bioinformatics core as needed in statistical methods. Personnel also attend scientific conferences and participate in advanced workshops at the meetings. In addition, the core organizes weekly journal club discussions of new innovations within the bioinformatics community. c. Information resources and management service Uma, you said the core does two things in the intro and now you are introducing a third The core has unique interdisciplinary expertise and is in a good position to generate and maintain information resources, including customized technology development projects not found in commercial software. Microarray processes generate large amounts of data. Information including the images, image quantitation data, and attributes of the samples are quite valuable. Generally, only a fraction of the results generated by microarray experiments can be further investigated by a single workgroup. Unlike the research samples generated by the investigators, the information generated by these experiments represents a resource that should be widely shared. The maintenance and dissemination of this resource requires specialized equipment and expertise. For the gene expression microarrays processed in the core, the results of all array experiments are stored in the mAdb shared repository, accessed and used by ROB investigators and their collaborators. In addition, accompanying publication, the raw and processed data analyzed at the core will be submitted to public repositories as required by the scientific journals (e.g. Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo )), and be available to researchers globally. After publication, information and processed data are maintained at the microarray array core. These data along with data from public resources are used to generate user friendly tools with an advantage of: 1) presenting data with common standards, and 2) mining the integrated data for added confidence of a particular result and help conserve resources by redirecting investigators research efforts in other directions. This is accomplished using MySQL database components configured with custom software written by bioinformatics staff in the core. The database is maintained locally and backed up remotely by NCI-maintained servers.