Cancer sequencing projects have identified a very large number of mutations whose cancer relevance is not yet understood. To address this need, our team has produced an application for high-throughput computational analysis of cancer mutations. The application -- CRAVAT/MuPIT -- accepts millions of mutations in a single batch upload, and it is designed to work as a funnel, so that users can narrow down a very large number of mutations discovered in a sequencing project to a tractable number of those most likely to be informative. The system enables combination of multiple metrics to generate a prioritized list. Metrics include quality control of mutation calls, mutation consequence types, bioinformatic scores to identify pathogenic and driver mutations, population allele frequencies, previous occurrence in tumor tissue types, pointers to relevant literature and interactive visualization on annotated protein structures, including customizable views of biologically important positions, mutations and mutation hot spots from 21 TCGA tumor types. The CRAVAT/MuPIT application has been available through a web interface for 21/2 years. In that time, analysis jobs have been submitted by 6,056 unique users from 88 countries on six continents (North America, South America, Europe, Asia, Australia and Africa). We have processed a total 12,051 missions, ranging from a few mutations to over 62 million. In total, the application has processed over 673 million mutations. (Source: Google Analytics). We have attracted a user community that spans both basic and clinical cancer researchers, all of whom rely on high-throughput tumor sequencing in their work. To address the needs of our community going forward, we will extend the functionality of our application to more fully support analysis o non-coding mutations, identification of associations between mutations and drug response, and pathway-based interpretation of mutation impact. We will dedicate substantial effort to ensure that our tools are interoperable with other informatics services, can be run in cloud environments and can be run in locally-installed pipelines to support protected data.
The proposal supports continued maintenance and development of web-based software applications for the cancer genomics community to interpret small-scale mutations. The applications are designed to handle very large numbers of mutations and to provide analysis targeted at researchers who are not bioinformatics experts. The work will contribute to understanding of the genetic complexity and heterogeneity of tumors and assist in discovery of new approaches for cancer prognosis and treatments.