The investigators develop a framework for causal inference from two-level factorial and fractional factorial designs with particular sensitivity to applications to social, behavioral and biomedical sciences. The framework utilizes the concept of potential outcomes that lies at the center stage of causal inference and extends Neyman's repeated sampling approach for estimation of causal effects and randomization tests based on Fisher's sharp null hypothesis to the case of 2-level factorial experiments. The framework allows for statistical inference from a finite population, permits definition and estimation of parameters other than ``average factorial effects'' and leads to more flexible inference procedures than those based on ordinary least squares estimation from a linear model. It also ensures validity of statistical inference when the investigation becomes an observational study in lieu of a randomized factorial experiment due to randomization restrictions.

Factorial designs allow efficient and cost-effective assessments of the relative effects of several factors and their interactions on output variables of interest. Such designs have been successfully applied in several scientific, engineering and industrial endeavors, but not often used in the social, behavioral or biomedical sciences in spite of several potential applications in these fields. The proposed methodology addresses the complications associated with multi-factor experiments in the aforesaid fields and has a wide range of applications. It can be applied, for example, to assess the impact of several new initiatives on high-school education; or to conduct cost-effective clinical trials to study individual and combined effects of different treatments offered to patients suffering from a certain disease; or to identify critical factors that affect yield of complex physical processes in material science like synthesis of nanostructures. It can also be applied to comparative effectiveness research (e.g., in evidence-based medicine).

Project Report

Experimental designs constitute a class of statistical methodology in which one or more inputs or experimental factors (treatments) are deliberately varied to study their (individual and combined) causal effects on some measurable characteristic (response) associated with a group of experimental units. Among various types of experimental designs, factorial designs involve simultaneous exploration of the causal effects of several factors on the response of or more outcome variable (response) of interest. The analysis of factorial experiments is typically driven by the manner in which the experimental units (e.g., plots of land) are assigned to treatment combinations (e.g., types of irrigation and fertilizer). Randomized allocation is considered to be the "gold standard" because it guarantees balance of uncontrollable variables across treatment combinations on an average and provides a basis for inference. Originally motivated by agricultural experiments, factorial designs have found extensive applications in industrial experimentation as well as in the physical sciences. In comparison, they have found few applications in social, behavioral and biomedical (SBB) experiments. However, in recent times, there has been a considerable amount of interest in the possibility of application of such designs in SBB experiments. Examples include educational experiments, clinical trials and stem-cell experiments. The difficulties involved in application of factorial designs to SBB experiments include: (a) Large unit-to-unit difference due to the presence of a large number of uncontrollable covariates (e.g., different medical conditions of a group of patients exposed to a clinical trial). (b) The desire to distinguish between inference for the finite population of units in the study (e.g. volunteers) and that from a larger population. (c) Practical difficulties in randomizing units to certain treatment combinations (e.g., schools refuse to accept a new bonus scheme for teachers as an attempted new intervention aimed at improving school performance). Our research involves development of a basic framework for drawing causal inference from factorial experiments where one or more of the above challenges exist. Our methodological contribution has been disseminated through three papers in peer-reviewed journals (two accepted, one under review) and two doctoral dissertation theses. We have also developed some computation tools in the form of R packages and codes that should help practitioners use some of our proposed methods. The methods for screening active effects have been applied to stem cell experiments conducted at the Melton Laboratory at Harvard. The project involved the design, implementation and analysis of an experiment to ascertain the causal effects of twenty-four chemical modulators on the ability of converting pancreatic stem cells into insulin-generating pancreatic beta cells. Successful identification of these modulators (and the appropriate levels of their concentrations) that have a significant impact on the probability of conversion of a stem cell into a beta cell could lead to a significant breakthrough in research on stem-cell therapy for reduction of Type-I diabetes. In spite of the adoption of cutting-edge technology, these experiments were complex due to: (i) the large number of chemical modulators involved which made the search space very large (ii) possible existence of complex interactions among these modulators, which, if exploited, could lead the experimenters to significant enhancement of the probability of conversion, (iii) limited availability of a-priori biological knowledge. A major challenge in this project were therefore to choose an appropriate factorial design, come up with a meaningful statistical analysis consistent with the experimental design, overcoming unprecedented disturbances that resulted in distortion of some information and finally, interpret results in such that way so as to shed adequate light on the role of the modulators and their interactions, and pave the way for the follow up experiments. The Bayesian analysis method developed by our research team was successfully applied to achieve this task. The project has had a big impact on statistics education. The research findings have been presented and included in the syllabi of an undergraduate and a graduate-level course on experimental design taught at Harvard University. Further, the research outputs constitute two chapters of the first draft of a new textbook titled "Experimental design: a randomization-based perspective" being co-authored by the PI and the Co-PI. Two graduate students involved in this project have gained substantial experience and expertise in conducting methodological and applied research in statistics. Both the students successfully defended their theses and were awarded the Ph.D. degree from Harvard University. They also gained professional experience by presenting some of their research findings at conferences.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
United States
Zip Code