Triple-negative breast cancer (TNBC) is defined by lack of expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER-2) and is characteristically an aggressive cancer, especially in a metastatic setting. Approximately 15-20% of all breast cancers are TNBC. In spite of recent improvements in TNBC treatment, the lack of known specific therapeutic targets and the heterogeneous response to chemotherapy make it difficult to attack TNBC and obtain a consistent outcome and meaningful benefit. Recently, cisplatin chemotherapy has regained interest based on growing evidence on achieving better outcome from preclinical and clinical data. However, many TNBC patients are not responding to the treatment;and there is no clinical practical way to identify in which individuals'cisplatin chemotherapy will be effective t avoid unnecessary toxicity and cost of healthcare. The objective of this study is to develop a computational framework, based on signal processing and machine learning techniques, for identifying novel cisplatin response candidate biomarkers in TNBC more accurately and efficiently from next-generation sequencing (NGS) data. The recent discovery of the p63/p73 expression, p53 mutation and measurements of DNA repair status effects on the sensitivity to cisplatin in TNBC patients has indicated the existence of cisplatin response predictors and the need for further investigation. In this study, we will develo a novel sequence-based copy number variation (CNV) detection tool, using signal processing techniques;and a novel supervised integrative analysis tool, based on Bayesian network analysis which integrates CNV, point mutation and gene expression data. We will hone and validate the innovative methods and tools on publically available data such as The Cancer Genome Atlas (TCGA) data. Then by collaborating with oncologists and pathologists from Beth Israel Deaconess Medical Center (BIDMC) and using the Dana- Farber/Harvard Cancer Center DNA Resource Core services, we will generate novel DNA sequence and RNA- seq datasets on responsive and non-responsive TNBC tumor samples from an existing clinical trial, which was designed to study preoperative cisplatin in early-stage breast cancer. By applying the proposed computational framework we will shed unprecedented light on potential predictors of TNBC response to cisplatin therapy that can help guide biomarker selection. We will verify the candidate biomarkers through gene ontology and pathway analyses. In addition, we will analyze TCGA data to determine the prevalence of these candidate biomarkers in TNBC.

Public Health Relevance

The objective of this study is to develop a computational framework, based on signal processing and machine learning techniques, to more accurately and efficiently identify novel cisplatin response candidate biomarkers in triple negative breast cancer (TNBC) from next-generation sequencing data. Successful completion of this proposal will result in two important public health impacts: (1) Candidate response biomarkers of cisplatin chemotherapy responsive TNBCs, and (2) A computational approach supporting personalized medicine for TNBC. Furthermore, once established, this framework can be extended to the detection of biomarkers in other tumor types, and can contribute to improving the drug development process and the effectiveness of cancer care.

National Institute of Health (NIH)
Career Transition Award (K99)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard Medical School
Schools of Medicine
United States
Zip Code