This research project will develop new statistical computer-intensive procedures that address current problems in the social sciences. Two different research questions that are related by common principles and methodologies will be addressed by this research. First, the project will develop new methods for generating inferences from ranked data that do not rely on strong model assumptions. It is common for data available from different populations (such as countries, schools, or hospitals) to be ordered by some performance measure, say from best to worst in the form of ranks. Improved methods for ranked data will have many applications in the social sciences, including the analysis of countries ranked in reading, math, and science or the ranking of neighborhoods by intergenerational income mobility. Second, the project will address the 'hot hand fallacy' and the human misperception of randomness. The common understanding of streaks in small samples has been challenged recently, and this research will provide solid statistical tools to address this controversy. In addition, graduate students will participate in the research process, and the project will develop free and easily accessible software.
The problems to be addressed in this project are related by common inferential methodologies that will provide sound principles to areas in need of formal statistical analysis. So that inferential procedures will not rely on unverifiable model-based assumptions, computer-intensive methods of statistical inference will be used, such as resampling, bootstrap, and randomization methods. Although machine calculations will be developed, they will be accompanied by mathematical or theoretical results that justify their use. The problems to be addressed will require novel insights in order to develop rigorous statistical properties so that the methods may be applied safely in practice. In addition, the project will draw heavily on the literature in multiple testing and simultaneous inference. For example, inference for ranks will result in the construction of simultaneous confidence regions for ranks with guaranteed error control. This will allow researchers to know whether empirical rankings represent real differences between populations or whether they are just artifacts of the data. While the problems to be addressed stem from specific applied questions, they will require solutions that are of fundamental importance in statistics. These open problems are exciting and challenging not only from the point of view of mathematical statistics, but also because burgeoning applications demand new statistical methodology.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.