In the big data era, massive data with complex structures are generated in an explosive fashion. Non- and semi-parametric models are powerful statistical tools for exploring nonlinear patterns hidden in complex data, and have been used in a wide range of fields, such as, biomedical science, geology, engineering and social sciences. However, traditional non- and semi-parametric methods are limited in their ability to deal with massive data of high dimensions. The goal of the proposed research is to develop effective tools for dynamic estimation and variable selection for non- and semi-parametric models in the massive complex data setting. The proposed research generates new methods and theory, and will provide practitioners in different fields with tools to better understand complex and dynamic structures in massive data.
As an effective dimension reduction tool, variable selection is very useful for modeling high-dimensional data. In recent years, the properties of the penalized methods have been well investigated for both linear models and semiparametric models. However, those methods generally do not allow the variable selection to dynamically change with other variables. For many longitudinal or spatial data in practice, there is a need to develop a general framework for dynamic variable selection that allows possibly different sets of relevant variables to be selected in different time periods or at different spatial locations. The objectives of the proposed research include: (1) to develop a novel procedure for dynamic variable selection in the varying coefficient model; (2) to provide large sample properties to ensure that the proposed method provides an optimal solution when the sample size is sufficiently large; (3) to develop an efficient algorithm which allows one to obtain a higher percentage of correct-fitting even when the dimension of covariates is large; (4) to apply the dynamic variable selection to study time-varying network data ; (5) to develop an innovative penalized spline procedure with triangulations which plays the roles of dynamic local signal detection, as well as efficient estimation of sparse non-parametric functions on irregular domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.