This project addresses many of the challenges that pertain to the following trends in data gathering and information processing. First, recent technological advances are pushing our society toward generation of massive quantities of data. Second, this massive data generation and collection is having an unintended consequence: the fraction of dirty data, defined as incomplete, grossly erroneous or mislabeled data, within these data sets is increasing. Third, there is an increasing shift toward relying on interconnected sets of geographically-distributed data for inference and decision making. Collectively, these three trends portend an inevitable transition to a data-driven world rife with big, dirty, and distributed data. Information processing in this new age of big, dirty, distributed data demands novel mathematical data models and robust computational and statistical tools.

The intellectual merit of this project lies in the ways it addresses the challenges of information processing for big, dirty, distributed data. First, it deals with the challenge of processing for big, dirty data by developing theoretical and algorithmic foundations of a novel geometric signal/data model that results in improved inference from big data, even in the presence of dirty data, because of the model?s ability to faithfully capture the ?ambient geometry? of big data. Second, it develops and analyzes novel collaborative processing algorithms that build on top of the developed model for improved inference from big, dirty data distributed across the world. The research agenda of this project impacts nearly every discipline that relies on advances in information processing for improved inference and decision making. In addition, it impacts the society and the US healthcare system through its applications to early cancer detection, activity recognition in chaotic trauma bays, and collaborative digital pathology. The education agenda of this project impacts the society and the US economy through K-12 and college level outreach activities within New Jersey, modernization of the Rutgers signal processing curriculum, and training of undergraduate and graduate students for careers in data science.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1453073
Program Officer
Phillip Regalia
Project Start
Project End
Budget Start
2015-07-01
Budget End
2021-06-30
Support Year
Fiscal Year
2014
Total Cost
$550,000
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854