Cancer presents a significant public health burden, with an incidence rate of 467.4 per 100,000 and a mortality rate of 189.8 per 100,000 in the in the United States. Improving these figures requires both improving detection and treatment. An important aspect of these efforts is the identification of genetic variants associated with cancer, which have already yielded translational results that permit personalized approaches to cancer prevention and treatment. Modern high-throughput biological experiments, including gene expression and SNP arrays, provide an unprecedented ability to investigate the genetic causes of cancer in fine detail by simultaneously assaying 105-106 markers. However, cancer is a disease with heterogeneous and complex causes that involve multiple genes. Because the single-marker analytical approaches typically used in these studies are likely to miss complex multi-gene effects, there is a pressing need for analysis techniques that have the power to reveal multi-gene, system-level changes driving carcinogenesis. To fill this methodological gap, I propose three novel techniques for pathway based analysis of genomic data. These methods harness our current knowledge of biomolecular interaction networks (pathways). By summarizing the data across each pathway, the pathway behavior as a whole may be compared in cases and controls without requiring strong single-gene associations.
In Aim 1, I propose a method to formalize pathway summarization for genome-wide association study (GWAS) SNP data without relying on the significance single-locus associations, thereby allowing comparisons of pathway as a whole between case and control groups using genotype data.
In Aims 2 and 3, I propose two distinct methods for using pathway topology to create a summary statistic from gene expression data, again without relying on the significance single-gene associations, thereby capturing not only the genes present in a pathway but their potential interactions as well. Each of these methods is highly novel: pathway summarization of this type for GWAS has not yet been reported, and the network characteristics I use for Aims 2 &3 have never been applied to biological data. These methods complement existing analytical techniques and make it possible to identify target pathways for cancer prevention and treatment. By filling an important methodological gap, the proposed Aims would provide patient-centric analysis techniques that recognize the inherent complexity and diversity of cancer genetics, and thereby advance personalized medicine.

Public Health Relevance

An understanding of the complex genetic determinants of cancer is crucial to improving early detection and designing personalized therapies. Analytical methods that identify complex patterns of genetic differences which promote carcinogenesis will significantly improve predictions of cancer susceptibility and indicate targets for rational drug design, thereby reducing the public health burden of cancer.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Career Transition Award (K22)
Project #
Application #
Study Section
Subcommittee G - Education (NCI)
Program Officer
Jakowlew, Sonia B
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Northwestern University at Chicago
Public Health & Prev Medicine
Schools of Medicine
United States
Zip Code
Braun, Rosemary (2014) Systems analysis of high-throughput data. Adv Exp Med Biol 844:153-87
Braun, Rosemary; Finney, Richard; Yan, Chunhua et al. (2013) Discovery analysis of TCGA data reveals association between germline genotype and survival in ovarian cancer patients. PLoS One 8:e55037
Chen, Qing-Rong; Braun, Rosemary; Hu, Ying et al. (2013) Multi-SNP analysis of GWAS data identifies pathways associated with nonalcoholic fatty liver disease. PLoS One 8:e65982
Braun, Rosemary; Buetow, Kenneth (2011) Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet 7:e1002101
Braun, Rosemary; Leibon, Gregory; Pauls, Scott et al. (2011) Partition decoupling for multi-gene analysis of gene expression profiling data. BMC Bioinformatics 12:497