Technological advancements in information gathering, and the increased fusion of mathematical innovation with biological, oceanic/atmospheric, and psychosocial sciences, have created a plethora of highly complex and very high-dimensional data sets in interdisciplinary research contexts. The non-standard features of such data include non-normality, complex heterogeneity and dependence structures, high-dimensionality, low sample sizes and unbalanced designs. The investigator puts forth more realistic statistical models for such data sets and develops advanced statistical methods for their analysis. This project has five specific aims: to enhance the modeling alternatives by proposing fully nonparametric models for crossed and nested two-way random and mixed effects designs, to construct statistical procedures for the common hypotheses of interest under each of these models (including robust rank-based procedures), to propose order thresholding (thresholding based on L-statistics) for reducing the dimensionality of the alternative hypothesis and for identification of the signal location, to propose a bootstrap testing method for improved accuracy of the test procedures, and to explore applications of the aforementioned test procedures to classification problems, through the recently developed test-based classification method. The proposed models and methods are fundamentally different in approach from the standard likelihood methods, the non- and semi-parametric models, and the Bayesian techniques.
The significance of this project stems from the fact that statistical analysis is the final, and often the most important, stage of many expensive scientific investigations. Does the concentration of certain contaminants in coastal waters have a decreasing or an increasing trend? Is a trend in the concentration of contaminants a result of natural processes or is it caused by human activity? Are gene expressions different under different biological environments and which genes are responsible for this difference? Has the Gang Resistance Education and Training (G.R.E.A.T.) program been effective in reducing adolescent deviant/illegal activities in urban areas? In early detection of the use of bioweapons, is there a signal (a certain symptom at rates higher than background) and if so where is it located? Typically, the data collected for answering such questions exhibit highly non-standard features. The data being collected by the Mussel Watch Project of NOAA's National Status and Trends program for monitoring marine environmental quality, is a good example of the type of complex features such data can exhibit. Due to their non-standard features, the data may fail to satisfy the regularity conditions that alternative models and methods require. Analyzing data under assumptions that are not satisfied may lead to incorrect conclusions regarding the statistically significant factors and trends, or may fail to identify an existing signal or the genes that are affected by a disease. The objective of this proposal is to develop advanced methods for data analysis based on realistic statistical models, and to develop software for their implementation.