Big Data analytics requires bridging the gap between data-intensive computing and data-driven computing to obtain actionable insights. The former has primarily focused on optimizing data movement, reuse, organization and storage, while the latter has focused on hypothesis-driven, bottom-up data-to-discovery and the two fields have evolved somewhat independently. This exploratory project aims to investigate a holistic Ecosystem that optimizes data generation from simulations, sensors, or business processes (Transaction Step); organizes this data (possibly combining with other data) to enable reduction, pre-processing for downstream data analysis (Organization Step); performs knowledge discovery, learning and mining models from this data (Prediction Step); and leads to actions (e.g., refining models, new experiments, recommendation) (Feedback Step).
Intellectual Merit: As opposed to the current practice of considering optimizations in each step in isolation, the project considers scalability and optimizations of the entire Ecosystem for big data analytics as part of the design strategy. The project aims to consider big data challenges in designing algorithms, software, analytics, and data management. This strategy contrasts with traditional approaches that first design algorithms for small data sizes and then scale them up. The project aims to treat data complexity, computational requirement, and data access patterns as a whole when designing and implementing algorithms, software and applications.
Broader Impacts: The project could advance the state of the art in big data analytics across a number of key applications such as Climate Informatics and Social Media Analytics. The software resulting from the project is being made available to the broder scientific community under open source license. The project offers enhanced opportunities for education and training of graduate students and postdoctoral researchers at Northwestern University.