The collection of new data in any discipline does not, in general, lead to the creation of new knowledge. With the current data deluge, the human role in scientific discovery, traditionally so important, must now be partially fulfilled by powerful algorithms. However, current tools and technology start to break down when discovery and understanding, by the very nature of the science at hand, must happen quickly and in near real-time.
New astronomical surveys coming online in the next few years, many observing the same regions of the sky repeatedly in time, will collect more data in the next decade than in all of human history so far. Opening up truly new vistas on the dynamic universe requires both rapid data processing and quick decisions about what available resources (e.g., telescopes) worldwide must be marshalled to study newly discovered phenomena. This necessitates an intelligent "real-time" machine-based decision or "classification" framework that should be able to deal with incomplete (and in some cases spurious) information.
This project will produce a framework for extracting novel science from large amounts of data in an environment where the computational needs vastly outweigh the available facilities, and intelligent (as well as dynamic) resource allocation is required. New theory will be developed that will allow current machine learning paradigms to scale to large parallel computing environments. The core result is the production, for projects generating thousands of gigabytes of new data a night (such as the proposed Large Synoptic Survey Telescope), of probabilistic statements about the physical nature of astronomical events. Uncovering anomalous events that do not fit easily into a currently accepted classification taxonomy - events that may lead to completely new scientific discoveries - will be particularly emphasized in this work.
Building these computational tools now with concrete scientific returns in mind will form the foundation for more rapid transformative applications in other fields with similar demands and constraints (high-frequency financial data, robotics, medical signal monitoring, geophysics, weather, and particle physics). This endeavor will also serve for years as a training ground for students and researchers across several departments and disciplines, and will broaden their scope towards a truly interdisciplinary education. By exposing students in the physical sciences to cutting-edge computer science and machine learning concepts, this project will provide a frame-work for computational thinking that will lead to future innovation.