This is an exploratory application aimed at developing novel computational approaches for identifying infectious agents that contribute to cancer. To achieve this we will merge data from two distinct approaches: gene expression profiling and metagenomics. Approximately 20% of all cancers worldwide are associated with infectious agents including viruses, bacteria and parasites. It is likely that this number is an underestimate and that many more cancers are caused by agents that await discovery. One powerful approach to uncovering potential cancer-causing microorganisms is virtual subtraction in which tumor genomic sequences or gene expression profiles are searched for non-human sequences. Potential cancer causing agents are then identified among these nonhuman sequences by searching public nucleic acid and protein databases and identifying similar sequences that can then be associated with a known virus, bacteria or other organism. A major limitation of this approach is the lack of representation of sequences from most organisms, especially microorganisms, in the databases. Metagenomics is an approach in which specific biomes are sampled for all microorganisms followed by deep sequencing. Individual species are then identified by comparing the sequence reads obtained with sequences deposited in public databases. Studies from a number of laboratories including our own have shown that most sequences obtained in metagenomic surveys do not match anything in existing databases suggesting they are derived from previously uncharacterized agents. For example, our studies suggest that the 3,000 or so currently known viruses represent less than 0.01% of viruses in nature. Similarly the vast majority of bacterial species await discovery and characterization. We propose to search gene expression profile data to determine if any of these novel metagenomic sequences are expressed in tumors. We will also develop the computational tools that will allow uncharacterized viruses, bacteria or other organisms that we identify to be isolated and their association with cancer studied. The identification of new potential tumorigenic infectious agents will have a direct impact on the diagnosis and treatment of cancer.

Public Health Relevance

In order to design diagnostics and therapies for different cancers we must know what is causing them. This project is aimed at discovering infectious agents that cause or contribute to the cause of cancer. The identification and characterization of these agents will thus lead to new methods for the diagnosis and treatment of cancer.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-SRLB-D (M1))
Program Officer
Read-Connole, Elizabeth Lee
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Arts and Sciences
United States
Zip Code
Cantalupo, Paul G; Katz, Joshua P; Pipas, James M (2015) HeLa nucleic acid contamination in the cancer genome atlas leads to the misidentification of human papillomavirus 18. J Virol 89:4051-7
Mishra, Nischay; Pereira, Marcus; Rhodes, Roy H et al. (2014) Identification of a novel polyomavirus in a pancreatic transplant recipient with retinal blindness and vasculitic myopathy. J Infect Dis 210:1595-9