The Miner Suite: State of the Art Bioinformatic Tools and Data Resources

Weinstein, John

Abstract

The Genomics and Bioinformatics Group (GBG) represents the Center for Cancer Researchs most substantial enterprise in bioinformatics. That bioinformatic activity is well-integrated with the experimental activities of the group. In fact, all of the algorithmic and software developments have been motivated by needs of the groups experimental science. Over the last several years weve developed the Miner Suite as a state-of-the-art, web-based, portable, principally-Java set of tools and databases focused on needs that we found were not being served by other bioinformatic developments. Our web site, http://discover.nci.nih.gov, has been featured in Science and, locally, in the NIH Catalyst. The following is a list of the Miner Suite modules, ending with four that went online in 2007: CellMiner, SpliceMiner, AffyProbeMiner, and LeFEminer: MedMiner accelerates searching and organization of PubMed literature 5-10 fold (L. Tanabe, et al.). MatchMiner translates among gene identifiers for lists of genes from microarrays and other high-throughput omic platforms (K. Bussey, et al.). GoMiner leverages the Gene Ontology for lists of genes (e.g., from microarrays) (B. Zeeberg, et al.). CIMminer produces Clustered Image Maps (CIMs) (i.e., clustered heat maps), the ever-present icon of Postgenomic biology. We introduced CIMs in the early 1990s. LeadScope/LeadMiner links molecular targets (e.g., in the NCI-60 cancer cell lines) to 27,000 chemical substructures in a fluently browsable format (with P. Blower, et al.). MIMminer provides searchable electronic forms (eMIMs) of the elegant, scholarly Molecular Interaction Maps developed by K. Kohn, M. Aladjem, and Y. Pommier. High-throughput GoMiner extends GoMiners statistics and visualizations to encompass large sets of microarrays (e.g., from time course studies or clinical trials) (B. Zeeberg, et al.). AbMiner provides a browsable relational database of available monoclonal antibodies that we have characterized. It includes our quality control results and multiple link-outs (S. Major, et al.). SmudgeMiner diagnoses spatial artifacts on Affymetrix and spotted arrays (M. Reimers, et al.). MethMiner provides pattern visualization and statistics for DNA methylation. Program not yet public. (S. Kim, et al.). Miner Suite programs that went online publicly in 2007 were as follows: CellMiner provides the NCI-60 and other molecular profile databases in a SQL-searchable relational format, with metadata on the experimental platforms and cell types (U. Shankavaram, S. Varma, et al. -- see 2007 publication in Molecular Cancer Therapeutics). SpliceMiner provides a robust infrastructure for dealing with transcript splice variation (A. Kahn, M. Ryan, et al. -- see 2007 publication in BMC Bioinformatics). AffyProbeMiner provides what we believe to be the best tool for re-mapping Affymetrix probes to achieve sharper results from the commercially available Affymetrix arrays. It also integrates the re-mapped array data with GoMiner and High-Throughput GoMiner (H.-F. Liu, B. Zeeberg, et al. -- see 2007 publication in Bioinformatics). LeFEminer is a web-based implementation of the LeFE (Learner of Functional Enrichment) algorithm, which uses the random forest machine-learning paradigm in a novel way to predict functional relationships from microarray and other high-throughput data types. (G. Eichler, et al. -- see 2007 publication in Genome Biology). Those programs, largely designed and implemented under a competed contract with SRA International, (contract #263-01-D-0050, CIO-SP2 Delivery Order 2313028) are freely available and used by thousands of investigators worldwide. The success of the programs is attributable in part to our adoption of the Agile software development paradigm, which promotes close, iterative interaction between software engineers, biologists, and bioinformaticists. That success is also partially attributable to adoption of Unit and System Testing methods; whenever code is re-deposited in our version control system, its automatically subjected to >1700 tests to minimize the chance that changes made have broken something elsewhere in the overall code base. The development team won an SRA Project Excellence Award for the Miner Suite (1 of 4 awarded out of >700 competing projects). Were integrating the Miner Suite applications with a variety of public bioinformatic software projects, including caBIG, The Cancer Genome Atlas, and the CGEMS genome-wide association project. caBIG: We were awarded two caBIG pilot grants (one to caBIG-enable GoMiner, the other to caBIG-enable our NCI-60 databases). The group, particularly lead software engineer David Kane, has made strong contributions to caBIGs Integrative Cancer Research and Architecture Working Groups. The Cancer Genome Atlas: Working with TCGA development staff, we used SpliceMiner as the basis for development of bioinformatic infrastructure to support the integration of genotypic/phenotypic data of multiple types. That infrastructure has now been adopted by TCGAs Data Integration Committee for the purpose. CGEMS: Working with bioinformaticists in the NCI Division of Cancer Epidemiology and Genetics (DCEG), we have developed a program package provisionally named ChromMiner for visual integration of SNP data from the genome-wide association studies with phenotypic data on the cancers (including gene expression and comparative genomic hybridization data). That program package is central to a proposal, which I was instrumental in drafting, to combine the epidemiological/genotypic expertise of DCEG with the phenotypic expertise of CCR. The proposal is titled DCEG/CCR Plan for Follow-up of Genomic Regions of Association Identified by CGEMS

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Intramural Research (Z01)
Project #: 1Z01BC010842-01
Application #: 7592989
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 1
Fiscal Year: 2007
Total Cost: $1,051,653
Indirect Cost

Institution

Name: National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects


NIH 2008 Z01 CA	The Miner Suite: State of the Art Bioinformatic Tools and Data Resources Weinstein, John N. / National Cancer Institute Division of Basic Sciences	$336,151
NIH 2007 Z01 CA	The Miner Suite: State of the Art Bioinformatic Tools and Data Resources Weinstein, John N. / National Cancer Institute Division of Basic Sciences	$1,051,653

Publications

Okabe, Mitsunori; Szakacs, Gergely; Reimers, Mark A et al. (2008) Profiling SLCO and SLC22 genes in the NCI-60 cancer cell lines to identify drug uptake transporters. Mol Cancer Ther 7:3081-91

Eichler, Gabriel S; Reimers, Mark; Kane, David et al. (2007) The LeFE algorithm: embracing the complexity of gene expression in the interpretation of microarray data. Genome Biol 8:R187

Lee, Jae K; Havaleshko, Dmytro M; Cho, Hyungjun et al. (2007) A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery. Proc Natl Acad Sci U S A 104:13086-91

Kahn, Ari B; Ryan, Michael C; Liu, Hongfang et al. (2007) SpliceMiner: a high-throughput database implementation of the NCBI Evidence Viewer for microarray splice variant analysis. BMC Bioinformatics 8:75

Shankavaram, Uma T; Reinhold, William C; Nishizuka, Satoshi et al. (2007) Transcript and protein expression profiles of the NCI-60 cancer cell panel: an integromic microarray study. Mol Cancer Ther 6:820-32

Liu, Hongfang; Zeeberg, Barry R; Qu, Gang et al. (2007) AffyProbeMiner: a web resource for computing or retrieving accurately redefined Affymetrix probe sets. Bioinformatics 23:2385-90

Martin, Scott E; Jones, Tamara L; Thomas, Cheryl L et al. (2007) Multiplexing siRNAs to compress RNAi-based screen size in human cells. Nucleic Acids Res 35:e57

Comments

Be the first to comment on John Weinstein's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: