Society is generating data at an unprecedented rate, currently estimated at 2.5 quintillion bytes daily. Many of these data sets are notably complex, particularly because they often involve interdependencies which are difficult to identify. In the field of cancer genomics, thousands of measurements can be obtained with the objective of discovering molecular signatures that characterize biological processes. However, advances in this area have been limited due to major computational challenges involved in identifying the structures that are present in both healthy and cancerous cells. This project aims to develop new topological methods to detect hidden dependencies within and across different types of data obtained from breast cancer patients. The project will intensively train three graduate students each year in these novel methods and expand the undergraduate and graduate curricula in data analysis and applied topology. Results and materials will be broadly disseminated to the scientific community through publications in open access and standard journals, conference presentations, and open source software. Results will be also shared with the public, including teachers and students in grades 10th to 12th, through training courses and art exhibits.

Genomic technologies have revolutionized the field of genetics over the past decade, providing new methods for identifying thousands of genetic/molecular signals associated to specific phenotypes. Among these methods, Genome Wide Association Studies have accelerated the identification of specific genetic elements by testing thousands of genetic loci simultaneously. These approaches, however, are less useful for identifying co-occurrences of and interactions among genetic elements, conditions that appear to be ubiquitous in living organisms. To address this gap, the PIs will develop new mathematical methods to enable the identification of interactions among genetic elements in cancer, thereby testing the hypothesis that many cancer phenotypes are regulated by co-occurring genetic events. Using the combined tools of modern topological and data analyses, including machine learning techniques, the research team will identify such co-occurrences by: analyzing generators of homology groups, implementing a computational data-driven theory of fiber bundles, and developing new models of cancer evolution using Khovanov-type categorification methods. The ultimate goal of this project is to develop new computational tools in time series analysis that help identify hidden interdependencies of data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1854705
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2019-07-01
Budget End
2022-06-30
Support Year
Fiscal Year
2018
Total Cost
$130,970
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695