The proposal aims to develop new statistical theory and methodology on dimension reduction for high-dimensional non-regular models which allow for discontinuity with respect to a subset of the parameters or covariates. Such models arise naturally from applications in various fields, such as statistics, biostatistics, climate, marketing research, management, economics and finance. They can capture many important features of the data structure and association between the explanatory and response variables which either low-dimensional or regular models alone cannot duplicate. This proposal focuses primarily on threshold models, an important class of non-regular models which has a wide variety of applications in statistics, biostatistics, and economics. While the literature on threshold models for low-dimensional data is comprehensive, the statistical theory and methods for threshold models applied to high-dimensional data are undeveloped due to four central challenges: (I) statistical nonregularities of the estimation, (II) increasing dimensionality, (III) unknown or incomplete distributions of response variables, (IV) computational difficulties. By introducing penalization techniques, a number of related research topics are proposed for investigation. New tools for statistical inference and computational algorithms of non-regular models applied to large and high-dimensional data, for example the brain imaging data, will be developed.
These new developments will allow scientists to efficiently analyze data with substantially increased flexibility, interpretability and reduced modeling biases. In addition, the investigator will integrate new mathematical, probabilistic and computational tools with those in sciences and engineering. Dissemination of these developments will enhance new knowledge discoveries, and strengthen interdisciplinary collaborations. The research will also serve an educational purpose through multi-disciplinary courses on the contemporary state-of-the-art data mining and machine learning, and benefit the training and learning of undergraduate, graduate students and underrepresented minorities.
During the entire period of the award, the following outcomes are obtained. 1. Research and Education Activities: Students supervised: Five PhD students, Mr. Yi Chai, Mr. Xiao Guo, Mr. Lilun Du, Ms. Chen Cheng, Mr. Shengji Jia, have been working with the PI on projects related to the proposal. In particular, Yi Chai has finished the PhD degree; Du and Guo have passed the oral preliminary exams. The other students are in good progress. I have given the invited presentations on the research work related to the proposal in about 28 conferences or seminars/colloquia. 2. Findings of submitted research papers: The paper "Local tests for identifying anisotropic diffusion areas in human brain on DTI' by Yu, T., Zhang, C.M., Alexander, A.L., and Davidson, R.J. has been published in Annals of Applied Statistics (2013). The paper "Single-index modulated multiple testing" by Du, L. and Zhang, C.M. has been published in Annals of Statistics (2014). The paper ``Robust-BD estimation and inference for varying-dimensional general linear models." by Zhang, C.M., Guo, X.(s), Cheng, C. and Zhang, Z.J. has been publised in Statistica Sinica (2014). The paper "Estimation of the error auto-correlation matrix in semiparametric model for fMRI data" by Guo, X. and Zhang, C.M. has been accepted by Statistica Sinica. The paper 'Graphical-model based multiple testing under dependence with applications to genome-wide association studies' by Liu, J., Zhang, C.M., Burnside, E., McCarty, C., Peissig, P. and Page, D. (2012) has been accepted by the 28th Conference on Uncertainty in Artificial Intelligence.(acceptance rate < 96/304 = 31/%; also further selected for oral presentation (24 out of 304)). The paper 'High-dimensional structured feature screening using binary Markov random fields' by Liu, J., Zhang, C.M., McCarty, C., Peissig, P., Burnside, E., and Page, D. (2012) has been published in Journal of Machine Learning Research Workshop and Conference Proceedings Volume 22: AISTATS 2012, 712--721, 2012. Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS) 2012 (acceptance rate < 134/400 = 33.5%).