Integrated view of multi-dimensional cancer genomic data generated by large-scale investigation of tumor genomic alterations such as the The Cancer Genome Atlas Project (TCGA) is expected to greatly facilitate our understanding of cancer etiology. To meet the analytical challenges presented by this effort and to disseminate the results to the cancer research community, we developed Cancer Genome Workbench (CGWB) (http://cgwb.nci.nih.gov), a web portal that integrates and displays the genome-wide collection of somatic mutation, copy number variation, gene expression and methylation data generated by TCGA. Key discoveries of this multiple-platform, high-resolution genomic data such as recurrent mutations and copy number changes in glioblastomas can be visualized in genomic view, heatmap view, protein view, 3D structure view and sequence trace view. We are currently work on supporting data generated by the Next-Generation sequencing technology. The long-term plan for CGWB is to make it the most comprehensive cancer alteration data resource by integrating data across multiple cancer research projects. CGWB tools have been used by our group to identify putative mutations in TCGA data that are subsequently validated and to provide QA for data generated by Genome Sequencing Centers. Using these tools our group was the first to identify NF1 as one of the most frequently mutated genes in glioblastomas and the result was reported in the TCGA network paper published in Nature. CGWB was also used by the TCGA network members in identifying core pathways involved in GBM. Mutation analysis for TCGA project is an ongoing process and we recently have presented the highly mutated genes among the phase II TCGA gene list to the TCGA steering committee. In addition to TCGA project, our group is responsible for analyzing mutations for NCI's Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. We have recently identified and validated novel recurrent somatic mutations in ALL patients who had poor outcome. The mutation activates the receptor tyrosine kinase pathway and the availability of an existing inhibitor of the mutated gene suggests that this finding can be translated into therapy for poor outcome patients. Our group has also been analyzing the somatic copy number changes in 300 cell lines used for cancer research. This will provide insight into different drug response observed in these commonly used cancer cell lines. Three complementary approaches are being utilized to create pathway models: 1) statistical modeling, 2) logical modeling, and 3) computational modeling. The statistical methodology known as path analysis is being used to model gene expression data. These efforts will be extended to include a collection of pathway models of interest to cancer research derived from cancer (and normal tissue) data sets. The laboratory is also collaborating with the NCICB and CGAP to develop Logical Models of pathway data. This effort will utilize databases of biomolecular interactions in human and mouse based on KEGG and BIOCARTA pathway data. The last strategy being explored within the laboratory is computational modeling. Each element in the pathway is annotated with a set of incoming and outgoing connections, which link the gene or complex to other nodes in the system. Setting the state of a node to """"""""on"""""""" or """"""""off"""""""" triggers the propagation of the effects of the change throughout the system via the node's dependent connections. The utility of this approach is currently being assessed using expression data. Recognizing that there is no single best way to create a model of such complex processes as biologic pathways, these three complementary approaches are being employed and evaluated. The instantiation of pathways as code represents the first step in development of more complex computational models.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Intramural Research (Z01)
Project #
1Z01BC010470-06
Application #
7733041
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
6
Fiscal Year
2008
Total Cost
$363,916
Indirect Cost
Name
National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Radtke, Ina; Mullighan, Charles G; Ishii, Masami et al. (2009) Genomic analysis reveals few genetic alterations in pediatric acute myeloid leukemia. Proc Natl Acad Sci U S A 106:12944-9
Zhang, Jinghui; Finney, Richard P; Clifford, Robert J et al. (2005) Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach. Genomics 85:297-308
Zhang, Jinghui; Hunter, Kent W; Gandolph, Michael et al. (2005) A high-resolution multistrain haplotype analysis of laboratory mouse genome reveals three distinctive genetic variation patterns. Genome Res 15:241-9
Zhang, Jinghui; Rowe, William L; Clark, Andrew G et al. (2003) Genomewide distribution of high-frequency, completely mismatching SNP haplotype pairs observed to be common across human populations. Am J Hum Genet 73:1073-81