With the recent explosion of scientific data, and its unprecedented size and complexity, dimension reduction is becoming a central ingredient in any modern statistical analysis. This project aims to couple dimension reduction methodology with current statistical learning techniques, which results in an entirely new class of flexible and effective dimension reduction solutions for modern data with both high dimensionality and complex structure. From the coupling, the investigator establishes a framework for dimension reduction that incorporates prior information regarding the known structural relationships between the variables. Within this framework, the investigator plans to develop a family of dimension reduction solutions so that the results are more readily interpretable and accurate. Such a framework is to greatly facilitate the analysis of neuroimaging, climate, and genomic data where prior structural information is often available.
Modern technologies routinely produce massive amounts of data and such a data deluge now engulfs every branch of science and public life. As a result, scientific progress now heavily depends on the ability to process and analyze complex high-dimensional data. At the heart of these analyses are methods that reduce the dimensionality of the data, sometimes dramatically, by identifying a small set of variables that are important, or obtaining a few combinations of the original measurements. This project aims to develop a host of novel dimension reduction methods to address these pressing challenges in high-dimensional data analysis. The proposed research is expected to make significant contributions on two fronts: enabling scientists to quickly and effectively extract useful information from massive data, and at the same time, benefiting the discipline of statistics with advances in theory, methods and applications.
Modern scientific and business data are exploding with unprecedented size and complexity. Dimension reduction is playing a central role in statistical analysis of such high-dimensional, large-scale, and complex-structured data. This project has combined dimension reduction with state-of-the-art machine learning techniques and has produced a new class of flexible and effective dimension reduction solutions. Specifically, the project has advanced methodological research in the areas of model-free variable screening and selection, support vector machine based linear and nonlinear dimension reduction, and dimension reduction that incorporates prior subject knowledge. The resulting dimension reduction outcomes are more flexible, accurate, and readily interpretable. The supported research has prompted and facilitated collaborative research in genetics and genomics studies, climate studies, and neuroimaging analysis. New collaborations have been established with Pharmacometabolomics Research Networks, and North Carolina Biomedical Imaging Center. The award has also partially supported exploration and pursuit of high-dimensional data analysis and methodology development in general. In particular, it has led to new research direction in tensor regressions with applications in brain imaging analysis. The award has led to multiple publications in statistical and scientific journals, and has further inspired follow-up research by other investigators in the related areas. Proposed research has been dissimilated into thesis work of multiple Ph.D. students both inside and outside the PI’s home institution. Research materials have also been incorporated in a topic course on Big Data that was newly developed and partially supported by this award. The new course has been widely applauded by both students and colleagues of the PI, and an article introducing this course was published on the Amstat News.