The investigator explores and develops new principles for the design of statistical software to take advantage of modern computing power. Particular emphasis is placed on exploring the effective use of parallel computing, compilation, and code analysis for statistical languages. Pilot implementations are incorporated in open source statistical software systems.
Effective statistical methodology made available through statistical software is critical to the ability of researchers to make maximal use of experimental and observational data, and for the ability of instructors to teach good research practices. The design principles developed by this research lead to software that improves the ability of researchers, instructors, and other users of statistical methodology to apply this methodology more effectively in scientific research and teaching and to take full advantage of modern high-performance computational resources. These principles also lead to software frameworks that can be used to more rapidly deliver new statistical methodology to end users. Applications in a range of areas serve as testbeds for methods and principles developed in this research.
The objective of this research is to improve the ability of researchers, instructors, and other users of statistical methodology to apply statistical methods in new problem areas. The primary intellectual contribution of the research is the development of novel approaches for statistical and data analysis software to take advantage of modern computing power. In particular, the research has established the value of the use of parallel computing, code compilation, and code performance analysis tools within a statistical computing and data analysis framework, and has explored approaches for supporting the very large data sets that are now becoming available. A prototype compiler for the R statistical software system has been developed and released, as well as a parallel computing framework, and a framework for enabling existing software to be extended to handle much larger data sets than it was originally designed for. The broader impact resulting from the research is to improve the ability of researchers, instructors, and other users of statistical methodology to effectively apply this methodology in scientific research and teaching. This is accomplished by making available results of the research as enhancements and additions to the open source statistical software system R. This system is widely used in teaching, research, and industry, and is a strong influence on other software projects, including both research software and commercial software.