Lung cancer (LC) is the leading cause of cancer related death in the United States. Although genome-wide association studies have identified many LC susceptibility loci, most of its heritability remains hidden and might be further explained by copy number variation. To date, studies have provided robust evidence to support the unique roles of copy number variants (CNVs) in many cancer types, however, their risk effect and molecular mechanisms contributing to LC is still unclear. The overall objective of this R21 is to conduct a comprehensive study leveraging datasets from the large-scale Transdisciplinary Research in Cancer of the Lung (TRICL) consortium, a Lung eQTL dataset and two public data resources to discover high confidence CNVs predisposing to LC across histological subtypes. The central hypothesis is that CNVs are associated with LC susceptibility, regulate gene expression and have a potential to serve as novel biomarkers for prediction of LC. This hypothesis will be tested by pursuing two specific aims: 1) Determine the effect of CNVs on LC risk; and 2) Characterize the regulatory impact of CNVs on gene expression.
In Aim 1, with a large collection of data from LC patients and controls (n=36,068 total) from the TRICL consortium, we will rigorously evaluate CNVs as novel biomarkers for lung cancer predisposition. First, CNVs will be generated by a change- point based method, modSaRa2, and a Hidden Markov Model based approach, PennCNV. Then using a gene- based collapsing association test, duplications or deletions associated with LC will be determined. These significant associations will be validated by an independent replication dataset, the Environment and Genetics in Lung cancer Etiology (EAGLE) dataset (n=4,221). We will identify novel pathways, networks, and interactions underlying LC, which are significantly enriched by LC-susceptibility CNVs. Gaining insight into the underlying biological mechanisms of the influence of CNVs on LC risk is critical; therefore, in Aim 2, we will evaluate the regulatory impact of CNVs on gene expression, which is intermediate to many complex phenotypes. Genomic measures from the Lung eQTL study (n=1,038) and the public dataset GTEx (n=383) will be used to evaluate the associations between the identified LC-susceptibility CNVs and expression of their corresponding genes. A functional study with experimental design will be followed to test the downstream functions of the newly identified CNV regulated gene expression in growth and progression of cancer cells. This project has the potential to fill a gap in current knowledge about the utility of CNV as a new type of genetic variation influencing the risk of LC and provide a better understanding of the underlying molecular mechanisms. Our innovative, integrative genetics, genomics and bioinformatics approaches will identify novel genetic predictors that predispose to LC. This study has enormous potential for providing critical new directions that will allow exploration of a range of research questions about how CNV characteristics can be utilized for future risk management and treatment of human complex diseases.
This project aims to conduct a comprehensive and integrative study of copy number variants and gene expression in lung cancer. We expect that the outcomes from this project will have a significant impact on evaluating the potential of copy number variants as novel biomarkers of lung cancer, with the long-term goals of uncovering the copy number variants involved mechanisms underlying human diseases and providing new directions for the risk management and treatment of lung cancer and other complex diseases.