Ovarian cancer is the 5th leading cause of cancer death in women in the United States, and only 45% of women survive five years or more after diagnosis. A major shift in the understanding of high grade serous ovarian cancer (HGSC), which is the most common and aggressive form, has arisen from the report of four robust subtypes based on the analysis of whole genome gene expression data. However, there are demonstrated challenges in accurately classifying HGSC samples; in the Cancer Genome Atlas (TCGA) data, 18% of samples more closely resemble a subtype alternative to the one to which they were assigned, and more than 80% of samples are members of more than one subtype. We posit that these challenges are due to ovarian tumor samples belonging to a continuum of expression subtypes and that this biological feature hampers the ability of traditional crisp analyses, which assume that samples belong to a single subtype, to perform subtype assignment. To address this, we will first develop and evaluate an ensemble fuzzy algorithm that incorporates this biological feature by assigning all samples to all subtypes with varying degrees of membership. We will then characterize the performance of this algorithm in comparison to that of crisp algorithms on the TCGA HGSC data. Lastly, we will refine the gene lists selected by the best performing algorithm through the incorporation of annotated biological pathways. This refinement will create a new gene list which we expect to offer consistent predictive ability across diverse datasets. The completion of these research aims will provide invaluable training and experience in the construction of experimental design through simulation, the appropriate validation of new methodology in the context of existing methods, and the biological and pathway interpretation of bioinformatics results in order to engender epidemiological hypotheses. Mastery of these key scientific competencies is essential for the advancement of a career in the biomedical sciences. Additionally, the resulting robust and reliable classifier of HGSC samples that is constructed from the refined gene list is essential to move research in ovarian cancer epidemiology and personalized treatment forward, and the methods employed are applicable to a wide variety of phenotypes.
We will develop and evaluate a novel data analysis algorithm in order to more accurately classify ovarian cancer tumor samples into subtypes. This more accurate classification will allow for better understanding of risk factors that are associated wit each subtype, and will move research in personalized treatment forward.
|Way, Gregory P; Rudd, James; Wang, Chen et al. (2016) Comprehensive Cross-Population Analysis of High-Grade Serous Ovarian Cancer Supports No More Than Three Subtypes. G3 (Bethesda) 6:4097-4103|
|Rudd, James; Zelaya, René A; Demidenko, Eugene et al. (2015) Leveraging global gene expression patterns to predict expression of unmeasured genes. BMC Genomics 16:1065|