There are three goals for this project. The first goal is to develop data-driven assessments of the complexity of data generators and data-driven assessments of the complexity of the predictive techniques to be used for a data generator and then relate them to each other. It is expected that a complexity matching principle between data generators and their predictors will be established. The motivation is to speed the search for predictors that have low generalization error. The second goal is to develop techniques to derive modeling information from good predictors. The motivation is to be able to make statements about the data generator beyond numerical prediction. The third goal is to use these techniques on a complex data set for which a predictive approach is essential because the extreme complexity of the data means it defies conventional modeling. The motivation is to verify that the complexity based techniques give reliable inferences for an important question such as `which of those who have suffered a traumatic event are likely to get post- traumatic stress disorder'.
The motivation for the overall project is to find ways to get information out of data that is so complex conventional techniques are ineffective. Such data is becoming increasingly common as the number of data types increases and as data bases become more comprehensive. The problem with conventional techniques seems to be that they assume a model that means something physically before there is a strong enough basis even to propose one. The approach here is significant because it is overtly predictive: Instead of proposing models, one can propose predictors that are easier to test and then study the predictors to make statements about whatever it was that generated the data. This reverses the usual approach in which one models first and then predicts.