This research is motivated by numerous real life problems whose modeling and analysis involve functional data, i.e., data where the measurements per subject/replicate correspond to values of a function (referred to as sample trajectory). In particular, this research is motivated by functional clustering problems and functional data which are dynamical in nature. Functional principal components analysis (FPCA) has been widely used in analyzing functional data. In spite of its success, FPCA tends to be inefficient if the geometry of the trajectory space is non-Euclidean, especially when sample trajectories are only observed at sparse sets of time points, as is the case for many scientific studies. Sources for such nonlinearity include but not limited to the existence of underlying clusters of the sample trajectories, or the sample trajectories being governed by a nonlinear dynamical system. The investigator proposes a new strategy (referred to as the local FPCA framework) for analyzing sparsely and noisily observed functional data. It aims to derive more efficient localized representations for sample trajectories which take into account geometric structures of the trajectory space. This framework combines the principles underlying functional principal components analysis with the notions of functional clustering and nonlinear dimensionality reduction. Specific aims of this research include: (a) Develop a local FPCA framework which clusters the sample trajectories into homogeneous subgroups and applies FPCA within each cluster to derive more efficient representations of the sample trajectories. (b) Fit ordinary differential equation models with random parameters by a model-based local FPCA approach. (c) Study theoretical aspects of the proposed methods and apply them to various scientific problems.

This research will produce a new set of statistical tools for scientists working in various fields such as plant biology, ecology and epidemiology who must analyze longitudinal/functional data. In particular, this research is a stepping stone toward understanding complex dynamical systems. The PI is collaborating with scientists on studying HIV disease dynamics at a population level, and this research helps achieving a better understanding of these systems which hold important implications in the pathologies of AIDS. The computational and analytical tools resulted from this research are also likely to stimulate further studies in related fields. Moreover, this research develops open source software that is freely available to the whole scientific community. Facing complex data and challenging questions, a new generation of researchers needs to be trained in an inter-disciplinary manner. The broader training component includes exposing statistics/biostatistics students to real scientific problems involving functional data. On the other hand, through collaborations, scientists working in related fields are able to enhance their quantitative analysis skills.

Project Report

As more and more data being collected nowadays, there is a pressing need for methods that are able to retrieve information which is easily understandable to humans as well as faithful to the data. This award aimed at developing methods that can be used to understand and analyze data which are both large and complex. Confronted with high-dimensional data (e.g., data with many variables, or data in the form of functions or images), a straightforward idea is to represent them by simpler objects (e.g., 3- or 2- dimensional linear objects). This process (referred to as dimension reduction) facilitates visualization, understanding and modeling of the data. The most well known dimension reduction tool is the principal component analysis (PCA) where data is linearly transformed into a handful of principal components. However, many data are not only big, but also complex, a notable example being data generated by nonlinear dynamics (e.g., the growth dynamics of biological organisms). In such a case, PCA is not effective as it fails to recognize the intrinsic nonlinear nature of the data. During this award, a nonlinear dimension reduction framework is being developed, targeting at high-dimensional data which actually reside in a lower-dimensional nonlinear subspace (e.g., a curved line or surface). The idea is to subdivide data (through a clustering procedure which recognizes the geometry of the data) into nearly linear pieces and then apply PCA within each piece. This new framework has been shown to be effective in learning the intrinsic structure of the data and in representing data in a faithful and efficient way. It is widely applicable to various types of data including multivariate data, functional data and longitudinal data. This award also developed novel methods for data generated by nonlinear dynamical systems. The study of nonlinear dynamics is of great importance to various scientific fields including the study of the growth of biological organisms, the growth of socio-economic indices and the progression of certain diseases. Most of the previous works in Statistics focused on learning such dynamics based on data from a single subject (e.g., the growth curve of one individual). This award aimed at a population level modeling which takes into account both the commonality of the dynamics from different subjects and the individual level variations. Such methods would be more useful and efficient in modeling and understanding the system through aggregating information across the entire population as well as allowing for subject specific features. The methods developed in this award have been applied to model human growth dynamics and housing price, which provided insights and revealed new features, leading to a better understanding of the underlying mechanisms governing these processes. Finally, this award helped support several graduate students in their dissertation research and contributed to training the next generation statisticians to be better equipped to face the challenges and opportunities in an era of information and data explosion.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1007583
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2010-07-15
Budget End
2014-06-30
Support Year
Fiscal Year
2010
Total Cost
$149,685
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618