This research project will develop a unified set of statistical, computational, and software tools to address data mining and discovery science challenges in the analysis of complex and noisy high-dimensional data. Data with these characteristics are common in many fields, including the health sciences, economics, finance, and neuroscience. However, statistical methods to analyze these types of data have not kept up with the development of new technologies and new datasets. This project will develop robust data analysis methods that are scalable to large complex datasets. The ability to minimize the impact of high dimensionality, data complexity, and noisiness on data analysis will facilitate new discoveries in important areas such as health and economics. The project will conduct both theoretical and empirical studies. Publicly available software will be disseminated to complement the research activities. The investigator will mentor graduate students in statistics and social sciences and will seek to broaden the participation of underrepresented groups.
This project will develop multivariate rank-based methods that are robust to model misspecification, outliers, missing values, and data dependency. Three problems associated with rank-based inference for complex and noisy high-dimensional data will be addressed. First, multivariate rank-based statistical methods for robust functional principal component analysis will be developed. These new methods will improve on current functional principal component analysis tools for noisy data and will be applied to the analysis of physical activity data. Motivated in part by studies of complex and noisy stock market and neuroimaging data, the project will devise multivariate rank-based dependence measures as new quantifications for measuring between-group dependences. The new quantifications will simultaneously incorporate nonlinear dependence measurement, consistency of testing, robustness, and distribution-freeness. Building on the second activity, the project will estimate group-level networks through multivariate rank-based dependence measures to characterize conditional instead of marginal independence structure.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.