Astronomy has entered an era of massive data streams, with catalogs containing hundreds of millions of stars and galaxies measured at thousands of time-steps with hundreds of attributes to be analyzed. To extract knowledge from these large and complex data sets we must account for noise and gaps, and understand if and when we may have detected a fundamentally new physical phenomenon. The problem is not solely the size of the data, but a basic question of how to discover, represent, visualize and interact with the knowledge that these data contain. Astronomical data provide a popular testbed for developing methods applicable throughout the physical and life sciences.

astroML is an open source machine-learning library that addresses all of the challenges, providing a publicly available repository for fast python implementations of statistical routines for astronomy, as well as examples of astrophysical data analyses using techniques from statistics and machine learning. In the three years since its release, astroML has been installed over 21,000 times. The current project will further develop astroML into a general machine learning toolkit for the next generation of astrophysical surveys, adding code examples and tutorials, exploiting multicore and multiprocessing hardware, and supporting the second edition of the text "Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data". Algorithms to be developed include approximate Bayesian computation, hierarchical Bayes, an interface to deep learning algorithms, and modifying the regression and regularization code to account for uncertainties within the data.

All developed algorithms will be publicly available, and astroML has already been used in cancer research and analysis of the securities market, and to teach data science in astronomy. The refactored code can be used to teach both the statistics and software engineering techniques needed for large scale machine learning.

National Science Foundation (NSF)
Division of Astronomical Sciences (AST)
Standard Grant (Standard)
Application #
Program Officer
Nigel Sharp
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
United States
Zip Code