Recent biomedical research has made great progress in unveiling the complexity of human disease, while technological breakthroughs now allow much more detailed analysis of molecular behavior. As a result however, experimental results are frequently too complex for synthesis in the traditional model of tracing disease through individual genes. Instead, molecular pathways are gaining prominence as a new framework for analytic research. Pathways integrate information from across the entire genome while mirroring real biological processes. Disruption of the benign behavior of a pathway as a whole, not necessarily a single component of the pathway, could be the basis for disease. As yet, there exists no robust or straightforward means to transform the large-scale molecular expression data common to most genetic studies into meaningful data at the pathway level. To facilitate this promising mode of investigation, the PathOlogist has been developed as a resource capable of systematic and efficient pathway-centric analysis of molecular data. The PathOlogist is a new tool designed to automatically analyze large sets of genetic data within the context of molecular pathways. The tool aims to facilitate both a quantitative and qualitative analysis of pathway behavior in a format accessible to both laboratory researchers and informatics analysts. Foremost, the PathOlogist uses RNA expression data to calculate 2 descriptive metrics - activity and consistency - for each pathway in a set of more than 500 canonical pathways (source: Pathway Interaction Database http://pid.nci.nih.gov). Activity scores provide a measure of how likely the interactions within the pathway are to occur while consistency scores provide a measure of pathway logic by comparing the expected with de facto outcome of interactions. Pathway scores can be generated for any number of samples, and for any subset of the entire pathway collection. The program then allows a detailed exploration of the results through integrated visualization of pathway components, structure, and scores, hierarchical clustering of pathways and samples, and statistical analyses designed to identify associations between pathway scores and clinical features such as cancer type or patient survival. The PathOlogist provides a powerful means of identifying common molecular processes implicated in disease. By viewing molecular behavior at the pathway level, the metrics generated by the PathOlogist often provide further insight into disease pathology than could be gained from individual gene-based analyses. The tool is already being used for such diverse applications as predicting response to cancer treatment and identifying molecular signatures associated with cancer phenotype. In addition to PathOlogist, the Buetow lab uses the Pathways of Distinction Analysis(PoDA). We applied PoDA to 2287 genotypes obtained from the Cancer Genomic Markers of Susceptibility (CGEMS) breast cancer study. Briefly, the samples comprised 1145 breast cancer cases and a comparable number (1142) of matched controls from the participants of the Nurses Health Study. All the participants were American women of European descent. The samples were genotyped using the Illumina 550K arrays, which assays over 550,000 SNPs across the genome. To provide a preliminary assessment of the validity of PoDA with observational data, we first examined a SNP set comprising the four SNPs in intron 2 of FGFR2 that were reported to show significant association with case status in (59) . As expected, we see a significant difference. Next, we applied PoDA systematically to the pathways represented in PID (28) using CGEMS data. A total of 69453 SNPs in the data could be associated with at least one of the pathways. These SNPs were observed to represent 4446 unique genes and the most significant SNP for each gene was retained for further analysis. The Wilcoxon p-values for cases and controls were computed for each pathway, and the multiple hypotheses were corrected using FDR adjustment (60,61) and significant pathways were reassessed by resampling to dummy pathways to adjust for pathway size. The most significantly associated pathway is Focal adhesion. Interestingly, this pathway is already being targeted by novel cancer therapeutic drugs (62-64). Four networks: FGF signaling, MAPK signaling, regulation of actin cytoskeleton, and prostate cancer contained FGFR2. All yielded significance p-values, however, only regulation of actin cytoskeleton was significant in comparison to randomly generated pathways of the same length. To assess whether the result is solely due to the presence of FGFR2, we eliminated the FGFR2 SNP from the regulation of actin cytoskeleton pathway and recomputed the p values;while the p-value for the Wilcoxon test rose, it remained highly significant, suggesting that the association of actin cytoskeleton regulation with breast cancer is not driven solely by differences in FGFR2.
Greenblum, Sharon I; Efroni, Sol; Schaefer, Carl F et al. (2011) The PathOlogist: an automated tool for pathway-centric analysis. BMC Bioinformatics 12:133 |