Missing data often plague data analysts' attempts to interpret clinical and social data. We propose to produce a software toolkit, S+MissingData, that enables medical researchers to apply principled methods of handling missing data without wasting either data or human labor resources. S+MissingData will implement missing data procedures that are applicable more or less routinely to a wide variety of missing data problems. The proposed research rests on several foundations: (1) a particular model based approach (2) a variety of recently developed computational tools and (3) implementation in a modern statistical computing environment. In Phase II, we will implement tools for creating and managing objects used in statistical analysis with missing data. Graphics and a graphical user interface will make the software easy to learn and use. Research will extend current methods to handle data arising from two extremely important sources: longitudinal and complex survey designs. S+MissingData will enable medical researchers to earn a greater return on their investment of collecting data: maximally extracting information and achieving reliable inferences, despite missing data.
This research will produce an add-on software module for S-PLUS called S+MissingData, which will offer a scientifically and cost-effective way to handle missing data. We expect a wide market. This module will appeal to existing S-Plus users, and attract new users - in disciplines as diverse as biology, medicine, sociology, marketing, and economics. This research will also lead to short courses, books, videos and other educational material.
Schafer, J L (1999) Multiple imputation: a primer. Stat Methods Med Res 8:3-15 |