Cancer presents a significant public health burden, with an incidence rate of 467.4 per 100,000 and a mortality rate of 189.8 per 100,000 in the in the United States. Improving these figures requires both improving detection and treatment. An important aspect of these efforts is the identification of genetic variants associated with cancer, which have already yielded translational results that permit personalized approaches to cancer prevention and treatment. Modern high-throughput biological experiments, including gene expression and SNP arrays, provide an unprecedented ability to investigate the genetic causes of cancer in fine detail by simultaneously assaying 105-106 markers. However, cancer is a disease with heterogeneous and complex causes that involve multiple genes. Because the single-marker analytical approaches typically used in these studies are likely to miss complex multi-gene effects, there is a pressing need for analysis techniques that have the power to reveal multi-gene, system-level changes driving carcinogenesis. To fill this methodological gap, I propose three novel techniques for pathway based analysis of genomic data. These methods harness our current knowledge of biomolecular interaction networks (pathways). By summarizing the data across each pathway, the pathway behavior as a whole may be compared in cases and controls without requiring strong single-gene associations.
In Aim 1, I propose a method to formalize pathway summarization for genome-wide association study (GWAS) SNP data without relying on the significance single-locus associations, thereby allowing comparisons of pathway as a whole between case and control groups using genotype data.
In Aims 2 and 3, I propose two distinct methods for using pathway topology to create a summary statistic from gene expression data, again without relying on the significance single-gene associations, thereby capturing not only the genes present in a pathway but their potential interactions as well. Each of these methods is highly novel: pathway summarization of this type for GWAS has not yet been reported, and the network characteristics I use for Aims 2 &3 have never been applied to biological data. These methods complement existing analytical techniques and make it possible to identify target pathways for cancer prevention and treatment. By filling an important methodological gap, the proposed Aims would provide patient-centric analysis techniques that recognize the inherent complexity and diversity of cancer genetics, and thereby advance personalized medicine.
An understanding of the complex genetic determinants of cancer is crucial to improving early detection and designing personalized therapies. Analytical methods that identify complex patterns of genetic differences which promote carcinogenesis will significantly improve predictions of cancer susceptibility and indicate targets for rational drug design, thereby reducing the public health burden of cancer.