Acceleration of computing power of supercomputers along with development and deployment of large instruments such as telescopes, colliders, sensors and devices raises one fundamental question. "Can the time to insight and knowledge discovery be reduced at the same exponential rate?" The answer currently is clearly "NO", because a critical step that combines analytics, mining and discovering knowledge from the massive datasets has lagged far behind advances in software, simulation and generation of data. Analysis of data requires "data-driven" computing and analytics. This entails scalable software for data reduction, approximations, analysis, statistics, and bottom-up discovery. Scalable and parallel analytics software for processing large amount of data is required in order to make a significant leap forward in scientific discoveries. This project develops innovative, scalable, and sustainable data analytics algorithms to enable analysis and mining of massive data on high-performance parallel computers, which include (1) bottom-up and unsupervised data clustering algorithms that are suitable for spatio-temporal data, massive graph analytics, community computations, and detection of patterns in time-varying graphs, different types of data, and different data characteristics; (2) change detection and anomaly detection in spatio-temporal data; and (3) tracking moving data and cluster dynamics within certain time and space constraints. These parallel algorithms use the massive amount of data generated from scientific applications, such as astrophysics, cosmology simulations, climate modeling, and social networking analysis, for result verification and performance evaluation on modern high-performance parallel computers. This project directly addresses the critical needs for spatio-temporal data analysis, performance scalability, and programming productivity of large-scale scientific discovery via parallel analytics software for big data. This work will impact applications of enormous societal benefits and scientific importance such as climate understanding, environmental sustainability, astrophysics, biology and medicine by accelerating scientific discoveries. Furthermore, the developed software infrastructure can be used and adopted in commercial applications, such as commerce, social, security, drug discovery, and so on. The source codes are open to the public for all community to adapt, build-upon, customize and contribute to, thereby multiplying its value and usage.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1409601
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2014-06-01
Budget End
2019-05-31
Support Year
Fiscal Year
2014
Total Cost
$709,342
Indirect Cost
Name
Northwestern University at Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60611