This research project aims to develop new statistical theory, methods, and computing algorithms to solve practical problems where the data present unique features such as large volume, large velocity of dynamic changes, and highly heterogeneous information from different individuals. The traditional one-model-fits-all paradigm may not have sufficient power to detect important predictors for heterogeneous subgroups. This research aims to develop alternative methods applicable to electronic health record data and valuable for assigning effective personalized treatments for more effective medical care. It is anticipated that the project will stimulate interdisciplinary collaborations with other scientists from disparate fields and that the work will also have applications in marketing, business, and financial services. The software under development will be disseminated to facilitate applications for large-scale complex data, and will be made available to industry in a timely manner to maximize the impact on society. Training of graduate students through involvement in the research is a part of this project.

This project aims to develop a new collaborative filtering method utilizing cluster information from users and items to provide more efficient recommender systems. The research also targets the development of personalized variable selection, while improving the estimation efficiency of the personalized variable coefficients and the prediction power. In addition, a mixed-effects estimating equation approach will be developed to reduce the estimation bias for informative missing data. Another research goal is to develop efficient computational algorithms and tools applicable for large-scale complex data. Each component of the research plan contains a range of topics, from methodological and computational development to applications in real world problems. In addition, the project will help to tackle fundamental questions in statistical science and will stimulate interest from large groups of scientists in the fields of recommender systems, random effects modeling, high-dimensional model selection, subgrouping and clustering, longitudinal/correlated data, informative missing data, and refreshment sampling. The development of advanced optimization techniques, algorithms, and computational technology will be valuable for other types of complex data problems as well.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1613190
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2016
Total Cost
$250,001
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820