Some pioneer studies have shown that cancer driver/contributor genes may show allelic imbalance of gene expression. This important allele-specific expression (ASE) information has not been widely used in cancer studies because ASE is not available from traditional gene expression microarrays. However, with the rapid advance of sequencing techniques, RNA-seq is becoming more popular and may replace gene expression microarrays in the near future. Using RNA-seq transcription abundance is measured by the number of sequence reads, and ASE can be measured by the sequence reads that overlap with the heterozygous SNPs. Therefore the only obstacle to using ASE in cancer studies is the development of appropriate statistical methods and data analysis strategies. These are the focus of the present research project proposed here. We propose to develop an unsupervised approach to identify genes with allelic imbalance of gene expression, develop new methods to associate allele specific copy number (ASCN) changes with ASE, and combine genomic data from germline and tumor tissues to prioritize causal germline mutations without requiring control samples or huge sample size. We will apply our method to study genomic data from 248 colorectal cancer patients. Colorectal cancer is the 2nd leading cause of death from cancer among adults. Every year in the United States, 160,000 cases of colorectal cancer are diagnosed and 57,000 patients die of this disease. Our results will provide insight into the molecular mechanisms of colorectal cancer, and thus help to identify therapeutic and drug development targets, ultimately reducing the burden of this disease. Our methods and data analysis strategies will also benefit many other cancer studies for the identification of relevant germline mutations and tumor driver/contributor genes.
It has long been recognized that there is considerable heterogeneity in cancer patients. It is of tremendous importance to understand such heterogeneity in order to apply personalized treatment. The relatively low frequency tumor contributors and germline mutations may explain such heterogeneity and our methodological innovations will provide a new route to harness the information of genomic data to identify such low frequency genetic/genomic variations. In this project, we will apply our methods to analyze a genomic dataset of 248 colorectal cancer patients derived from The Cancer Genome Atlas (TCGA) project. Our methods and the results of our real data analysis will benefit the detection, diagnosis, treatment, and prognosis of colorectal cancer, as well as other types of cancers.