Cancer sequencing projects have identified a very large number of mutations whose cancer relevance is not yet understood. To address this need, our team has produced an application for high-throughput computational analysis of cancer mutations. The application -- CRAVAT/MuPIT -- accepts millions of mutations in a single batch upload, and it is designed to work as a funnel, so that users can narrow down a very large number of mutations discovered in a sequencing project to a tractable number of those most likely to be informative. The system enables combination of multiple metrics to generate a prioritized list. Metrics include quality control of mutation calls, mutation consequence types, bioinformatic scores to identify pathogenic and driver mutations, population allele frequencies, previous occurrence in tumor tissue types, pointers to relevant literature and interactive visualization on annotated protein structures, including customizable views of biologically important positions, mutations and mutation hot spots from 21 TCGA tumor types. The CRAVAT/MuPIT application has been available through a web interface for 21/2 years. In that time, analysis jobs have been submitted by 6,056 unique users from 88 countries on six continents (North America, South America, Europe, Asia, Australia and Africa). We have processed a total 12,051 missions, ranging from a few mutations to over 62 million. In total, the application has processed over 673 million mutations. (Source: Google Analytics). We have attracted a user community that spans both basic and clinical cancer researchers, all of whom rely on high-throughput tumor sequencing in their work. To address the needs of our community going forward, we will extend the functionality of our application to more fully support analysis o non-coding mutations, identification of associations between mutations and drug response, and pathway-based interpretation of mutation impact. We will dedicate substantial effort to ensure that our tools are interoperable with other informatics services, can be run in cloud environments and can be run in locally-installed pipelines to support protected data.

Public Health Relevance

The proposal supports continued maintenance and development of web-based software applications for the cancer genomics community to interpret small-scale mutations. The applications are designed to handle very large numbers of mutations and to provide analysis targeted at researchers who are not bioinformatics experts. The work will contribute to understanding of the genetic complexity and heterogeneity of tumors and assist in discovery of new approaches for cancer prognosis and treatments.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24CA204817-05
Application #
9868900
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
2016-04-01
Project End
2021-03-31
Budget Start
2020-04-01
Budget End
2021-03-31
Support Year
5
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Johns Hopkins University
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
001910777
City
Baltimore
State
MD
Country
United States
Zip Code
21205
Bailey, Matthew H; Tokheim, Collin; Porta-Pardo, Eduard et al. (2018) Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 174:1034-1035
Reiter, Johannes G; Makohon-Moore, Alvin P; Gerold, Jeffrey M et al. (2018) Minimal functional driver gene heterogeneity among untreated metastases. Science 361:1033-1037
Ng, Patrick Kwok-Shing; Li, Jun; Jeong, Kang Jin et al. (2018) Systematic Functional Annotation of Somatic Mutations in Cancer. Cancer Cell 33:450-462.e10
Wang, Yuxuan; Li, Lu; Douville, Christopher et al. (2018) Evaluation of liquid from the Papanicolaou test and other liquid biopsies for the detection of endometrial and ovarian cancers. Sci Transl Med 10:
Sajulga, Ray; Mehta, Subina; Kumar, Praveen et al. (2018) Bridging the Chromosome-centric and Biology/Disease-driven Human Proteome Projects: Accessible and Automated Tools for Interpreting the Biological and Pathological Impact of Protein Sequence Variants Detected via Proteogenomics. J Proteome Res :
Masica, David L; Douville, Christopher; Tokheim, Collin et al. (2017) CRAVAT 4: Cancer-Related Analysis of Variants Toolkit. Cancer Res 77:e35-e38
Glusman, Gustavo; Rose, Peter W; Prli?, Andreas et al. (2017) Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework. Genome Med 9:113
Tokheim, Collin; Bhattacharya, Rohit; Niknafs, Noushin et al. (2016) Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure. Cancer Res 76:3719-31
Douville, Christopher; Masica, David L; Stenson, Peter D et al. (2016) Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel). Hum Mutat 37:28-35
Tokheim, Collin J; Papadopoulos, Nickolas; Kinzler, Kenneth W et al. (2016) Evaluating the evaluation of cancer driver genes. Proc Natl Acad Sci U S A 113:14330-14335