One of the most intriguing questions in modern science is to understand the human brain. In particular, scientists want to understand the differences between the brains of people with neurological disorders and those without. In brain imaging analysis, scientists collect data in the form of images that are used to compare the normal aging process to the development of neurological disorders. Through this project, the PIs seek to develop a toolkit comprised of a set of novel statistical methods, theories, and algorithms for the analysis of brain imaging data, as well as similar data that arise in a variety of scientific and business fields. The proposed research program is expected to make significant contributions on two fronts: timely responding to the growing needs and challenges of array data analysis, and providing a class of associated methodology that advances the statistical discipline. Research proposed in this project is to be disseminated through the investigators' close collaborations with the neuroscientists, as well as substantial educational and outreach activities.
Multidimensional array, or tensor, data are now frequently arising in a wide range of scientific and business fields. Aiming to address some of the most pressing questions in tensor data analysis, this research will integrate advanced statistical modeling devices with modern computational techniques to develop a set of novel tensor regression methods. Whereas there has been an enormous body of literature on high-dimensional regression analysis, nearly all work is with a vector response or predictor. Naively turning a tensor into a vector would result in ultrahigh dimensionality, destroy inherent structural information embedded in the tensor, and often render classical methods inadequate. This research will develop methods and tools for regression modeling of tensor responses or predictors, which both effectively tackles the high dimensionality and simultaneously preserves the tensor structure. Three sets of problems are to be investigated: (1) tensor response regression with envelope, aiming to address questions such as identifying brain regions exhibiting different activity patterns between the disease group and the general population after controlling for a set of potential confounding variables; (2) tensor predictor regression with envelope, aiming at questions of using brain images to diagnose neurodegenerative disorders and to predict onset of neuropsychiatric diseases; and (3) covariance matrix response regression with envelope, aiming to understand brain network alternations and building their associations with pathological phenotypes. The core idea underlying all of these aims is the adoption of a generalized sparsity principle and the development of a class of tensor envelope methods. The classical sparsity principle assumes a subset of individual variables are irrelevant, and various penalty functions are employed to induce such sparsity. By contrast, this generalized sparsity principle assumes linear combinations of variables are irrelevant, and the proposed envelope methods simultaneously identify and exclude such irrelevant information to achieve much improved estimation accuracy and efficiency.