The collection of new data in any discipline does not, in general, lead to the creation of new knowledge. With the current data deluge, the human role in scientific discovery, traditionally so important, must now be partially fulfilled by powerful algorithms. However, current tools and technology start to break down when discovery and understanding, by the very nature of the science at hand, must happen quickly and in near real-time.

New astronomical surveys coming online in the next few years, many observing the same regions of the sky repeatedly in time, will collect more data in the next decade than in all of human history so far. Opening up truly new vistas on the dynamic universe requires both rapid data processing and quick decisions about what available resources (e.g., telescopes) worldwide must be marshalled to study newly discovered phenomena. This necessitates an intelligent "real-time" machine-based decision or "classification" framework that should be able to deal with incomplete (and in some cases spurious) information.

This project will produce a framework for extracting novel science from large amounts of data in an environment where the computational needs vastly outweigh the available facilities, and intelligent (as well as dynamic) resource allocation is required. New theory will be developed that will allow current machine learning paradigms to scale to large parallel computing environments. The core result is the production, for projects generating thousands of gigabytes of new data a night (such as the proposed Large Synoptic Survey Telescope), of probabilistic statements about the physical nature of astronomical events. Uncovering anomalous events that do not fit easily into a currently accepted classification taxonomy - events that may lead to completely new scientific discoveries - will be particularly emphasized in this work.

Building these computational tools now with concrete scientific returns in mind will form the foundation for more rapid transformative applications in other fields with similar demands and constraints (high-frequency financial data, robotics, medical signal monitoring, geophysics, weather, and particle physics). This endeavor will also serve for years as a training ground for students and researchers across several departments and disciplines, and will broaden their scope towards a truly interdisciplinary education. By exposing students in the physical sciences to cutting-edge computer science and machine learning concepts, this project will provide a frame-work for computational thinking that will lead to future innovation.

Agency
National Science Foundation (NSF)
Institute
Office of International and Integrative Activities (IIA)
Type
Standard Grant (Standard)
Application #
0941742
Program Officer
Thomas F. Russell
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$1,573,550
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704