Many practical problems are related to prediction, where the main interest is at the subject (for example personalized or precision medicine) or (small) sub-population (for example small community) level. In recent years, new and challenging problems have emerged from diverse fields such as business, social sciences, and health sciences. Examples may involve prediction of a health outcome for a new patient or perhaps prediction of a new school's response to efforts to educate children about smoking prevention. The investigators have shown in previous work called classified mixed model prediction (CMMP) that in such cases, it is possible to make substantial gains in prediction accuracy by identifying a class that a new subject belongs to. However, the scenarios under which CMMP currently operates are somewhat constrained and many real-life situations fall outside its scope. Given the tremendous gains in accuracy that are possible, it would be very valuable to develop further methodology and computational advances to deepen knowledge in this area.
This project aims to make methodological advances of the classified mixed model prediction method into other types of subject-level prediction problems as well as to develop new inferential methods along the CMMP idea, by making the latter truly useful in practical situations. The basic idea of CMMP is to create a "match" between a group or cluster in the population for which one wishes to make prediction and a (massive) training dataset, with known groups or clusters. Once such a match is built, the traditional mixed model prediction method can be utilized to make accurate predictions. The practical challenges that will be solved in this project include i) how to deal with training data with unknown grouping; ii) how to deal with sparse, high dimensional covariates; iii) how to make better use of covariate information to improve accuracy of CMMP; and iv) how to provide accurate measures of uncertainty for CMMP-type predictions. Two important areas of application will be investigated. One is in precision medicine and health disparities focusing on the prediction of epigenetic markers using high dimensional genotype profiles. The other comes from the area of family economics using a large survey of data from China where predictions at finer levels of resolution (e.g., households) are of primary interest. Both applications will leverage important collaborations with practitioners and thus increase the impact of the work.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.