Genetic factors play an important role in the etiology of both sporadic and familial breast cancer, a complex, multifactorial disease. Known genetic risk factors identified to date, including both rare high- penetrance genes and common low-penetrance variants, explain only about 28% of heritability for breast cancer. Recently emerged evidence strongly suggests that most of the heritable risk for breast cancer and other complex diseases may be due to a large number of low-frequency moderate-penetrance genes that are difficult to identify using conventional family-based linkage analyses and genome-wide association studies (GWAS). In this application, we propose a novel study to systematically search for the entire coding region in the human genome to identify new genetic susceptibility factors for breast cancer. This study will be built upon the resources we established in three NCI-funded large epidemiologic studies conducted among women in Shanghai, in which genomic DNA samples and comprehensive clinical and epidemiological data were collected from nearly 8,000 breast cancer cases and a large number of community controls. Specifically, we propose to sequence the whole exome for 600 genetically-enriched breast cancer cases and 600 controls (Stage 1). Using data from Stage 1 and those from the 1000 Genomes Project, we will select approximately 350 promising genes for replication through variant genotyping (Stage 2) in an independent set of cases and controls. Approximately 20 genes will be selected for Stage 3 replication from those that show promising association in Stage 2 but require additional evaluation to either confirm or reject the hypotheses. To our knowledge, this is the first large association study for breast cancer using whole exome sequencing. With strong methodology and the use of novel technology and study design, the proposed study will identify novel genes and pathways that will significantly improve our understanding of breast cancer genetics and biology. Newly identified genes, particularly those with a substantial effect size, could serve as targets for novel cancer treatment and be used for cancer screening and risk assessment.

Public Health Relevance

Genetic factors play a major role in the etiology of breast cancers, yet only a small number of cases are explained by genetic factors identified thus far. Using novel research designs and sequencing technologies, this proposed study will systematically search the human genomes for genes that contribute to breast cancer susceptibility. Results from this study will significantly improve the understanding of breast cancer biology and genetics, which will be valuable in designing new therapies and cost-efficient prevention strategies for this common malignancy.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Epidemiology of Cancer Study Section (EPIC)
Program Officer
Nelson, Stefanie A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Vanderbilt University Medical Center
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Wu, Lang; Shi, Wei; Long, Jirong et al. (2018) A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat Genet 50:968-978
Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin et al. (2018) Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes. Hum Mol Genet 27:853-859
Wang, Jing; Raskin, Leon; Samuels, David C et al. (2015) Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics 31:318-23
Zhang, Yanfeng; Cai, Qiuyin; Shu, Xiao-Ou et al. (2015) Whole-Exome Sequencing Identifies Novel Somatic Mutations in Chinese Breast Cancer Patients. J Mol Genet Med 9:
Zhang, Yanfeng; Li, Bingshan; Li, Chun et al. (2014) Improved variant calling accuracy by merging replicates in whole-exome sequencing studies. Biomed Res Int 2014:319534
Zhang, Yanfeng; Long, Jirong; Lu, Wei et al. (2014) Rare coding variants and breast cancer risk: evaluation of susceptibility Loci identified in genome-wide association studies. Cancer Epidemiol Biomarkers Prev 23:622-8
Guo, Yan; He, Jing; Zhao, Shilin et al. (2014) Illumina human exome genotyping array clustering and quality control. Nat Protoc 9:2643-62
Guo, Yan; Cai, Qiuyin; Li, Chun et al. (2013) An evaluation of allele frequency estimation accuracy using pooled sequencing data. Int J Comput Biol Drug Des 6:279-93