The analysis of large complex data sets poses a major challenge for computational mathematics. Topological Data Analysis (TDA) applies concepts from algebraic topology to address this challenge. Topological techniques have been used successfully in engineering and biology but are seldom applied to the analysis of genomic data. In cancer genomics, the identification of copy number aberrations (CNAs) such as gains and losses of DNA segments is an important problem because CNAs are known to contain cancer genes and therefore to be involved in misregulation of key signaling pathways. CNAs may occur independently or may co-occur. The latter are believed to act synergistically and therefore result in unforeseen consequences. For example, the finding that CNAs in 8p12 and 11q13.3 are co-amplified in some breast cancers led to the discovery of functional interactions between the MYC and the TP53 pathways. Such interactions offer an important paradigm for cancer research because many tumors are believed to use similar mechanisms for their progression. Identification of co-occurring CNAs has traditionally been hindered both by the large sample sizes required to find co-occurrences and by the lack of mathematical methods to identify them efficiently. Recently, large microarray-based data sets have become available for breast cancer. The proposed research aims to develop a new TDA-based method to detect independent and co-occurring CNAs in breast cancer. In the proposed approach each CNA profile is characterized by a set of biologically-meaningful topological spaces. Topological invariants of these spaces (i.e., Betti numbers) will be used to de-noise the data and identify CNAs. This project will yield broadly-applicable methods for: (1) representing complex data sets in forms that are amenable to topological analysis; (2) determining the statistical significance of TDA results; (3) computing topological invariants from large data sets.

Rapid advances in the sciences have generated large, complex data sets of unprecedented proportions. New mathematical methods are urgently needed in order to solve fundamental problems in the analysis of such high-dimensional data. In the field of genomics, thousands of measurements have been obtained with the goal of unveiling molecular signatures that characterize essential biological processes. This field has significantly influenced the direction of breast cancer research because of its potential for differentiating various subtypes, pathways and prognoses of the disease. Currently, the major approach to detecting genomic signatures is focused on the identification of single independent events. However, there is increasing evidence that copy number aberrations (CNAs)-- such as amplifications and deletions of the genome--are not always independent of one another; rather, they may co-occur with synergistic and unforeseen consequences. For example, co-occurring CNAs detected in breast cancer have led to the identification of cross-talk between different signaling pathways. The systematic search for co-occurring CNAs has been hampered by a lack of mathematical methods adequate to identify them. The PI proposes to develop new methods in Topological Data Analysis to identify co-occurring CNAs in breast cancer. Further, because copy number changes are associated with other diseases and with evolutionary processes, this project will have important impacts across the sciences, both basic (e.g. evolution and development) and applied (e.g. diseases with a genetic component such as cancer, autism and multiple sclerosis). The proposed research will advance the field of mathematical genetics/genomics with new tools for analyzing complex interactions among genetic elements in genetic/genomic data. The methods developed also have the potential for extension to identify co-occurring events in large, complex longitudinal data sets. Furthering its broader impacts, the project will implement a series of public lectures on real-life applications of computational mathematics with special outreach to local school teachers, students and professionals.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1217324
Program Officer
Leland Jameson
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$239,987
Indirect Cost
Name
San Francisco State University
Department
Type
DUNS #
City
San Francisco
State
CA
Country
United States
Zip Code
94132