Likelihood introduced by Fisher is a central concept in statistics both from frequentist and Bayesian viewpoints. The research project is to advance a nonparametric likelihood approach that retains both the original meaning and the inferential power of Fisher's likelihood, and at the same time to construct estimating functions geared towards point estimators and Wald-type confidence intervals. The research studies semiparametric models for two-sample and regression problems in the absence and in the presence of missing data. The project also investigates statistical tools for causal inference in longitudinal studies with time-dependent treatments and confounders. The investigator's education plan involves designing a course on nonparametric likelihood and estimating functions with applications to semiparametric models and causal inference; supervising students with various backgrounds; establishing a causal inference working group as a research and educational platform; and organizing causal inference workshops for researchers and students to facilitate communications and collaborations.

The research will improve the validity and accuracy of inferences about environmental exposures, medical treatments, behavioral interventions among others in environmental, biomedical, and socioeconomic studies. The educational activities will help students from various backgrounds and researchers from various disciplines to acquire state-of-the-art statistical ideas and methods for empirical investigation and discovery.

Project Report

There are three main areas, for which statistical theory and methods have been developed in the research project. 1. Causal inference and missing data problems. A central problem of causal inference is to evaluate treatment effects (e.g., effects of drugs, medical procedures, behavioral interventions, etc.) from observational data in biomedical and socioeconomic studies. A rigorous framework for addressing such problems is to consider causal inference as a missing data problem: there are two or more potential outcomes that would be observed under hypothetical treatments, but one and only one of them can actually be observed. In a series of articles, we developed novel statistical methods, using propensity scores or instrumental variables, for drawing causal inferences in a transparent, efficient, and robust manner. We also developed a R computer package, iWeigReg, for implementing the proposed methods for causal inference and missing data problems, publicly available at http://cran.r-project.org/web/packages/iWeigReg/index.html. 2. Survey sampling. Survey sampling is perhaps the oldest area of statistics and often regarded as a unique area because the finite population under study is fixed whereas the sampling process is random. We proposed a novel approach to survey sampling by exploiting the connection between sampling and missing data problems: the data on individuals who are not in the sample are missing by design. This approach resolves two long-standing issues in survey calibration with auxiliary variables: how to achieve design-efficiency regardless of a linear super-population model in generalized regression and calibration estimation and how to find a simple approximation in optimal regression estimation. 3. Monte Carlo computation. Monte Carlo methods are useful to tackle challenging problems in statistical and scientific computation. Our work further develops a likelihood approach to Monte Carlo integration using multiple samplers, which not only synthesizes a number of previous methods in statistics but also provides a binless extension of the Weighted Histogram Analysis Method (WHAM) used in physics and chemistry. For example, we proposed more effective likelihood methods to achieve a better tradeoff between computational cost and statistical efficiency. We also developed a R computer package, UWHAM, for implementing the unbinned Weighted Histogram Analysis Method for estimating normalizing constants and expectations from multiple distributions, publicly available at http://cran.r-project.org/web/packages/UWHAM/index.html.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0749718
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2007-07-01
Budget End
2013-02-28
Support Year
Fiscal Year
2007
Total Cost
$400,005
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901