Genetic factors play an important role in the etiology of colorectal cancer (CRC). To date, approximately 50 genetic loci have been identified for CRC through genome-wide association studies (GWAS). However, these loci explain only a small fraction of heritability. Moreover, target genes and underlying mechanisms for most of these risk loci remain unclear. The large majority are noncoding variants, many of which have been shown to regulate gene expression. Recent studies suggest that ~80% of disease heritability can be explained by regulatory variants. However, these variants are each associated with only a small alteration in disease risk; thus they are difficult to identify using GWAS. Recently, a novel approach, the transcriptome-wide association study (TWAS), was developed to systematically investigate the transcriptome's association with disease risk. In TWAS, models are built to predict gene expression with cis-SNPs using a reference transcriptome, and then applied to GWAS data to evaluate their associations with disease risk. Here, we propose to use this innovative approach to scan the whole transcriptome to discover novel CRC susceptibility genes and uncover likely causal genes in loci revealed in previous GWAS.
In Aim 1, we will conduct a TWAS in European descendants. We will build expression prediction models for coding genes and non-coding RNAs in hundreds of colorectal tissues, other multiple tissues, and cross tissues using transcriptome and high-density genotyping data from individuals of European ancestry in the Genotype-Tissue Expression (GTEx) project. The models will be used to predict gene expression levels using GWAS data from approximately 27,911 CRC cases and 23,059 controls included in the ColoRectal Transdisciplinary Study (CORECT) and the Genetics and Epidemiology of Colorectal Cancer (GECCO) consortia, and then to evaluate their associations with CRC risk.
In Aim 2, we will conduct a TWAS in East-Asian descendants. We will generate transcriptome data and high-density genotyping data from 400 CRC patients of Asian ancestry from the Asia Colorectal Cancer Consortium (ACCC). We will use these data to build expression prediction models for coding genes and non-coding RNAs and perform a TWAS in approximately 18,999 CRC cases and 31,269 controls from the ACCC.
In Aim 3, we will experi- mentally evaluate biological function of the top 30 genes identified in Aims 1 and 2. Based on the association direction between their expression levels and CRC risk, we will either suppress expression using CRISPRi or promote it using CRISPRa in multiple normal colon epithelial and CRC cell lines. We will then perform in vitro assays and analyze bioinformatics evidence to examine the biological functions of these selected genes and to assess their potential roles in regulating known cancer-related pathways. Our proposed study is extremely cost-efficient, as both the transcriptome dataset (GTEx) for European descendants and the GWAS data are already available to us. This proposed study will provide strong evidence for pinpointing CRC susceptibility genes, thereby facilitating the translation of our findings to cancer prevention and patient care.
Using an innovative approach and cost-efficient research design, this proposed study will integrate transcriptome, genetic regulation, GWAS data and in vitro assays to discover candidate susceptibility genes for colorectal cancer. Results from this study will significantly improve the understanding of colorectal cancer biology and genetics. These novel genes could serve as targets for cancer treatments and chemoprevention.