Astronomical research is undergoing a transformation due to the proliferation of publicly available online datasets from all types of telescopes, and a large international effort is already underway to federate these diverse datasets for ready use by astronomers. A particularly important class of data arises from multi-epoch wide-field surveys, which are essentially 'movies' of the sky. These advances in time domain astronomy are crucial for such diverse and important research topics as exoplanet discovery, supernovae and other transients, variable stars, and accretion phenomena. However, most astronomers use only a narrow range of classical statistical methods for interpreting these large datasets. This problem can now be alleviated with the R statistical computing environment and its rapidly growing CRAN add-on packages. This project will bring the R software capabilities into the astronomical research community and introduce specialized astrostatistical methodology into R.

In particular, the research includes two complementary projects. First, CRAN packages will be developed for the analysis of time domain data with irregularly spaced observation times. This is a difficulty rarely encountered in other fields but common in multi-epoch astronomical studies, due to diurnal cycles, satellite orbits, survey cadence patterns, and telescope allocation limitations. Astronomers have developed a wide range of treatments for such problems, but most have not been evaluated statistically or incorporated into widely-used software packages, so a part of this study will be a statistical evaluation of competing methods. Second, the prototype VOStat Web service will be developed into a major tool and integrated into the growing Virtual Astronomical Observatory (VAO) software environment. VOStat will provide dozens of functionalities in many areas of applied statistics: data manipulation and visualization, nonparametric statistics and density estimation, probability density functions, regression and inference, multivariate analysis, clustering and classification, censoring and truncation, time series analysis, spatial point processes and image processing. These achievements will improve the statistical sophistication within the VAO and for thousands of other astronomical studies.

These software developments will improve the statistical analysis of a large number of astronomical research studies every year. Coding within R has the simultaneous advantage of inheriting the large infrastructure of methodology and graphics, itself of enormous value to the entire astronomical community. While the production of CRAN packages directly allows wide dissemination of the code, integrating the code into the VAO software environment through VOStat will make it conveniently accessible to all astronomers. A strong pedagogical component will further encourage less experienced astronomers to learn and use more advanced statistical methods. In addition, the CRAN packages on astrostatistical methods for irregular time series may have value to statisticians, physicists and economists who also might encounter datasets of this type.

Project Report

Two software environments are now dominating in the professional world data science: the general purpose language Python, and the disciplinary statistics R language. Astronomers are fully engaged in producing core software products in Python with the Astropy Project, but surprisingly few astronomers were familiar with R. Astronomers were historically wedded to the IDL commercial data anlaysis package that provided few statistical functions. They thus had no access to the >100,000 functions available in R and its >5000 add-on CRAN packages promotes an antiquated approach to astronomical data analysis. We produced CRAN packages that are specifically designed to strengthen the disciplinary foundations for astronomy. They include astrolibR: a collection of 64 functions translated from the long-standing IDL Astronomy Users Library with useful, often essential, operations for astronomical data astroFITSR: a CRAN package that provides the largest and most comprehensive software for reading and writing data files in the Flexible Image Transport System (FITS) that has been ubiquitously used by astronomical institutions and researchers for 40 years. It is a wrapper to the CFITSIO package of ~100 subroutines in C that is approved by the International Astronomical Union FITS Working Group. Another product that resulted from this project is VOStat, a statistical web service targeted to astronomers as well as scientists from other branches requiring data analysis in their research. Users can upload their data to the VOStat server (http://vostat.org), choose the analysis from a list of available statistical methods, and get the output back in their browser. The actual engine behind this is R, the largest public domain statistical software environment.

Agency
National Science Foundation (NSF)
Institute
Division of Astronomical Sciences (AST)
Type
Standard Grant (Standard)
Application #
1047586
Program Officer
Nigel Sharp
Project Start
Project End
Budget Start
2010-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$450,000
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802