Overview: The astrophysics community has produced a Petabyte of of imaging data in a wide range of wavelength channels in the last decade, and is planning to produce a thousand times more in the next. At the same time, the computer science machine learning community has developed powerful methods for extracting knowledge scalably from large, heterogenous data sets. This project is to construct a model--a detailed quantitative explanation--for every pixel of every digital astronomical image ever taken by any telescope in the world, including those from amateurs and hobbyists.

Technical Abstract

proposed model is a justified approximate probabilistic model, making extensive use of non-parametric Bayesian methods. The model will be hierarchical in nature, with the higher layers capturing regularities among stars and galaxies, and the lower layers will accurately model the image formation process, incorporating all the various noise processes. The internal parameters of the model will contain the best possible astronomical catalog given the input data; no current astronomical catalog is built using either hierarchical probabilistic inference, or built from the union of all available data. Science will be enabled by this catalog as it has been enabled by all previous astronomical catalogs: it will contain the position, brightness, temperature, parallax, and proper motion of every star and position, intensity, and morphology of every galaxy, even for sources for which there is evidence in the collection of data but not (sufficiently) in any individual image. Other internal parameters of the model will contain a quantitative description of calibration properties for all the image-generating hardware. All these products will help to refine and extend an existing astrophysics software and services for calibration and automated data processing.

Broader impacts: This project provides unique opportunities in citizen science since it leverages images taken by amateur and hobbyist astronomers in creating astronomical knowledge by offering them opportunities to contribute in exactly the same fashion as professional astronomers. The project draws together two disparate fields: machine learning and astronomy, and thus has the potential to create a new sub-area of "inferential astronomy". The project offers enhanced opportunities for research-based advanced interdisciplinary training for postdoctoral, graduate and undergraduate researchers. All code created for this project will be released under the open-source license and all model components, parameters, and other internals will be freely disseminated to the wider research community through the project websites at: http://Astrometry.net/ and http://thetractor.org/ .

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
New York University
New York
United States
Zip Code