Large tumor exome sequencing projects have identified a very large number of mutations whose cancer relevance is not yet understood. To begin to address this need, our team has produced two web applications for high-throughput computational analysis of cancer mutations: the Cancer-Related Analysis of VAriants Toolkit (CRAVAT) and the Mutation Position Imaging Toolbox (MuPIT). CRAVAT accepts millions of mutations in a single batch upload and maps mutations from genomic coordinates to annotated transcripts and proteins. MuPIT currently accepts batch uploads of up to 2500 SNVs and maps from genomic coordinates onto X-ray crystal structures of proteins from Protein Data Bank (PDB). We propose to combine and harden CRAVAT and MuPIT into a single web application, in which we will substantially improve the tools, user interface, software infrastructure, integration with external data resources and tools used by the community, and support for protected data. The scope of all tools in the web application will be broadened to handle analysis of the full range of small-scale mutation consequence types found in cancer exomes. CRAVAT analysis identifies mutations most likely to have deleterious impact on protein function and those that are most likely to confer a selective advantage to cancer cells (drivers), using classifiers developed by our team. Classifier scores are supplemented with annotations, including population allele frequencies, previous occurrence in tumor tissue types, and gene functional categories, enabling filtering (e.g. removing polymorphisms) and prioritization. Gene-level annotation and scoring, by aggregation of classifier scores from mutations in a cohort is also provided. MuPIT maps mutations from genomic positions onto to protein structures and provides interactive viewing of mutations in the context of protein structure, and in relation to a variety of annotations. To enable prioritization of interesting mutations and genes, the application provides a preview describing each structure and all available annotations (e.g., binding sites, experimental mutagenesis results, polymorphic and disease- associated variants that have been previously documented). After selecting a PDB of interest, the user is led to an interactive visualization page. An enhanced Jmol applet displays all SNVs mapped onto the structure. Frequently, many SNVs in the input list can be mapped onto a single structure, revealing clustering patterns around key functional sites. Based only on word-of-mouth, since the debut of the two applications in August 2012, CRAVAT has been utilized by 129 unique users from 39 countries, and it has analyzed 1,136 submitted jobs, totaling 27.9 million mutations. MuPIT has been utilized by 242 unique users from 25 countries, with 720 submitted jobs. (Source: Google Analytics).

Public Health Relevance

The proposed work will harden and develop web applications for the cancer genomics community to interpret small-scale mutations in cancer exomes. They are designed to handle very large number of mutations and to provide analysis targeted at researchers who are not bioinformatics experts. The work will contribute to understanding of the genetic complexity and heterogeneity of tumors and assist in discovery of new approaches for cancer prognosis and treatments.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-SRLB-4 (O1))
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code