Breast cancer is the most common cancer in women in the United States and worldwide. Although genome-wide association studies have identified multiple loci for breast cancer, most of heritability is still hidden. To date, transcriptome-wide association studies (TWAS) have been performed to quantify associations of genetically predicted gene expression with breast cancer risk. Our recent work showed that genetic variants that affect RNA splicing are very important contributors to complex traits but were previously missed when considering the genetic effects on gene expression only. Therefore, evaluating associations of genetically predicted splicing (as a linear combination of SNPs) with phenotypes has a great promise to discover novel putative candidate disease genes. Splicing events in local regions (such as intron excision clusters) can be highly correlated. However, existing statistical methods for TWAS do not account for correlation among splicing events, and thus may result in loss of power in detecting disease genes. Additionally, splicing levels (quantified as relative count ratios) in a gene and the overall gene expression level have not been considered together in previous gene mapping methods. For breast cancer prevention, stratification of women according to the risk of developing the cancer could improve risk reduction and screening strategies by targeting those most likely to benefit. SNP-based polygenic risk scores have been developed to predict breast cancer but their prediction accuracy remains low. To increase prediction accuracy, there is a need to incorporate useful information from genetically predicted expression and splicing. Recently, several transcriptome studies, such as GTEx, have collected DNA and RNA from multiple tissue samples; integrating information across multiple tissues into TWAS could significantly improve the identification of disease genes. In addition, African Americans (AAs) have different linkage disequilibrium (LD) pattern from Europeans, so genetic variants that affect RNA splicing and disease phenotypes could be ethnicity-specific. The objective of this study is to develop effective methods for gene mapping and genetic risk prediction of complex traits such as breast cancer by integrating multi?omics data from multiple tissues. Specifically, we will 1) develop methods for TWAS that leverage information of RNA splicing and expression from multiple tissues and apply the methods to identify novel breast cancer susceptibility genes; 2) develop joint polygenic risk prediction scores for breast cancer that model different LD patterns in distinct populations (including AAs) and incorporate information of genetically predicted splicing and gene expression from multiple tissues. We will account for correlation among splicing events in local regions and across multiple tissues. We expect that the proposed methods have higher power in gene mapping or higher accuracy in prediction of breast cancer than existing methods. The proposed methods can also be applied to other complex diseases.
The study proposed a series of novel methods that use RNA splicing and gene expression data from multiple body tissues to discover genetic variations in genes responsible for disease development. Then we apply these methods to identify susceptible genes for breast cancer and will have a good potential to translate knowledge from genome-wide association studies to the practice of breast cancer screening and advance the science of precision prevention.