The goal of the project is to study the fundamental role of geometry in statistics and utilize it for learning and inference. More specifically, the investigator proposes to (1) study the role of geometry in statistical inference of complex data, in particular manifold-valued data that are now routinely collected in many fields of science and engineering; and (2) investigate the role of geometry in high-dimensional data analysis where the data generating process often centers around some lower-dimensional geometric space. The central theme of this program is that geometry is inherently present in the data with the geometry either known or to be learned, which should be utilized for efficient and reliable statistical learning and inference. The investigator aims to make fundamentally mathematical, statistical and algorithmic advances in complex data analysis and high-dimensional data analysis. In addition, the investigator proposes a comprehensive and detailed educational and training program for graduates students, undergraduate students as well as high school students that is integrated into the research program.

Modern data of complex nature are routinely being collected in many scientific fields. One example is from diffusion tensor imaging (DTI) of neuroimaging, which obtains local information of neural activity through 3 by 3 positive definite matrices. DTI has clinical applications in the study and treatment of neurological disorders such as schizophrenia, as well as in detecting subtle abnormalities related to a variety of diseases (including stroke, multiple sclerosis, dyslexia). Other examples of complex data include digital images in machine vision, where a digital image can be represented by a set of landmarks, forming certain shapes. One may also encounter data that are stored in complex forms such as subspaces, surfaces, curves, and networks. The investigator will characterize the structure or geometry of complex data, and incorporate the geometry in developing valid statistical models for inference. In addition to complex data, it is also a common practice to collect high-dimensional data across many disciplines such as biology, public health and neuroscience. Being able to learn the often lower-dimensional geometry of the high-dimensional data is essential for accurate statistical inference and decision making in society. The investigator will utilize the geometry in developing valid statistical methods, which will be applied to medical data and neuroscience data, which has potential far-reaching impact in applied fields. In particular, the practical impact of the statistical methodologies from the project will be evaluated in the context of the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, and an Attention Deficit Hyperactivity Disorder (ADHD) data set. By completing the proposed program, the investigator expects to better serve society and advance science by applying the developed models and methods in important fields such as medical diagnostics by enabling accurate prediction, classification or detection of diseases or brain disorders.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1654579
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2017-07-01
Budget End
2022-06-30
Support Year
Fiscal Year
2016
Total Cost
$320,145
Indirect Cost
Name
University of Notre Dame
Department
Type
DUNS #
City
Notre Dame
State
IN
Country
United States
Zip Code
46556