This research project aims to develop new statistical theory, methods, and computing algorithms to solve practical problems where the data present unique features such as large volume, large velocity of dynamic changes, and highly heterogeneous information from different individuals. The traditional one-model-fits-all paradigm may not have sufficient power to detect important predictors for heterogeneous subgroups. This research aims to develop alternative methods applicable to electronic health record data and valuable for assigning effective personalized treatments for more effective medical care. It is anticipated that the project will stimulate interdisciplinary collaborations with other scientists from disparate fields and that the work will also have applications in marketing, business, and financial services. The software under development will be disseminated to facilitate applications for large-scale complex data, and will be made available to industry in a timely manner to maximize the impact on society. Training of graduate students through involvement in the research is a part of this project.
This project aims to develop a new collaborative filtering method utilizing cluster information from users and items to provide more efficient recommender systems. The research also targets the development of personalized variable selection, while improving the estimation efficiency of the personalized variable coefficients and the prediction power. In addition, a mixed-effects estimating equation approach will be developed to reduce the estimation bias for informative missing data. Another research goal is to develop efficient computational algorithms and tools applicable for large-scale complex data. Each component of the research plan contains a range of topics, from methodological and computational development to applications in real world problems. In addition, the project will help to tackle fundamental questions in statistical science and will stimulate interest from large groups of scientists in the fields of recommender systems, random effects modeling, high-dimensional model selection, subgrouping and clustering, longitudinal/correlated data, informative missing data, and refreshment sampling. The development of advanced optimization techniques, algorithms, and computational technology will be valuable for other types of complex data problems as well.