Some pioneer studies have shown that cancer driver/contributor genes may show allelic imbalance of gene expression. This important allele-specific expression (ASE) information has not been widely used in cancer studies because ASE is not available from traditional gene expression microarrays. However, with the rapid advance of sequencing techniques, RNA-seq is becoming more popular and may replace gene expression microarrays in the near future. Using RNA-seq transcription abundance is measured by the number of sequence reads, and ASE can be measured by the sequence reads that overlap with the heterozygous SNPs. Therefore the only obstacle to using ASE in cancer studies is the development of appropriate statistical methods and data analysis strategies. These are the focus of the present research project proposed here. We propose to develop an unsupervised approach to identify genes with allelic imbalance of gene expression, develop new methods to associate allele specific copy number (ASCN) changes with ASE, and combine genomic data from germline and tumor tissues to prioritize causal germline mutations without requiring control samples or huge sample size. We will apply our method to study genomic data from 248 colorectal cancer patients. Colorectal cancer is the 2nd leading cause of death from cancer among adults. Every year in the United States, 160,000 cases of colorectal cancer are diagnosed and 57,000 patients die of this disease. Our results will provide insight into the molecular mechanisms of colorectal cancer, and thus help to identify therapeutic and drug development targets, ultimately reducing the burden of this disease. Our methods and data analysis strategies will also benefit many other cancer studies for the identification of relevant germline mutations and tumor driver/contributor genes.

Public Health Relevance

It has long been recognized that there is considerable heterogeneity in cancer patients. It is of tremendous importance to understand such heterogeneity in order to apply personalized treatment. The relatively low frequency tumor contributors and germline mutations may explain such heterogeneity and our methodological innovations will provide a new route to harness the information of genomic data to identify such low frequency genetic/genomic variations. In this project, we will apply our methods to analyze a genomic dataset of 248 colorectal cancer patients derived from The Cancer Genome Atlas (TCGA) project. Our methods and the results of our real data analysis will benefit the detection, diagnosis, treatment, and prognosis of colorectal cancer, as well as other types of cancers.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Research Grants (R03)
Project #
5R03CA167684-02
Application #
8446327
Study Section
Special Emphasis Panel (ZCA1-SRLB-Q (J1))
Program Officer
Verma, Mukesh
Project Start
2012-05-01
Project End
2014-04-30
Budget Start
2013-05-01
Budget End
2014-04-30
Support Year
2
Fiscal Year
2013
Total Cost
$67,845
Indirect Cost
$20,845
Name
University of North Carolina Chapel Hill
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Rashid, Naim U; Sun, Wei; Ibrahim, Joseph G (2014) Some Statistical Strategies for DAE-seq Data Analysis: Variable Selection and Modeling Dependencies among Observations. J Am Stat Assoc 109:78-94
Lin, Ja-An; Zhu, Hongtu; Mihye, Ahn et al. (2014) Functional-mixed effects models for candidate genetic mapping in imaging genetic studies. Genet Epidemiol 38:680-91
Szatkiewicz, Jin P; Wang, WeiBo; Sullivan, Patrick F et al. (2013) Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation. Nucleic Acids Res 41:1519-32