Prostate cancer is the most commonly diagnosed cancer and the second-leading cause of cancer death in US men. Prostate cancer has a heterogeneous prognosis - many men have an indolent disease course while others have aggressive disease that progresses to metastases and death. Classification of tumors by recognized molecular subtypes of prostate cancer does not necessarily carry prognostic information. Progress in distinguishing potentially lethal from indolent disease and identifying molecular subtypes of prostate cancer potentially predictive of therapeutic response would be greatly accelerated through an accessible and reliably curated database of high-throughput molecular data from prostate tumors and adjacent normal tissue alongside relevant clinical annotations. We propose to develop the largest harmonized, multi-study dataset for prostate cancer specifically designed for systematic development and extensive multi-study validation of translationally relevant multi-omic biomarkers and molecularly defined subtypes. We will develop and apply a standardized data processing pipeline and consistently capture all reported clinical features of patients collected across >45 public datasets. To ensure data integrity of the clinical features, we will manually curate these data. In addition to currently available clinical annotations for these specimens we will computationally estimate tumor purity, immune infiltration and the contribution by the surrounding stroma. We will test the hypothesis that the estimated microenvironmental factors impact our ability to derive molecular subtypes and that these factors should be controlled for in order to robustly define prostate cancer molecular subtypes associated with clinically impactful outcomes. The dataset compiled in this project will be made public and accessible through the curatedProstateData package and GitHub.
Classification of prostate tumors by recognized molecular subtypes does not necessarily carry prognostic information, which hampers much needed progress in distinguishing potentially lethal from indolent disease. In order to identify robust prognostically relevant subtypes of prostate cancer, we propose to collect and harmonize a large collection of prostate cancer data, and then annotate these samples with computationally derived estimates of tumor purity, immune infiltration and stromal contributions. Taking this new information into account we propose the first multi-study evaluation of prostate cancer molecular subtypes in relation to clinicopathologic features and outcomes.