Functional data analysis is an emerging area in statistics that deals with a sample of random functions. In practice, measurements of these random functions are often taken intermittently at discrete time points and may be subject to random noise. This results in two scenarios, one with complete or dense recordings, termed 'functional data', and another with sparse measurements at discrete time points, termed 'longitudinal data' because such data are typical for longitudinal studies. Statistical analysis for these two types of data and the respective theory differ substantially. The investigator advocates that it is not necessary to develop methodology for functional and longitudinal data separately as is common practice. Instead, a unified approach that handles both data structures on a single platform will be developed. The approach is rooted in the dimension reduction approach of principal component analysis, which has been extended to functional/longitudinal data and termed 'functional principal component analysis (FPCA)'. Existing FPCA approaches assume that data are from a single population, thus do not take advantage of available information on covariates, which can be either time-independent or time-dependent. The first theme/aim of the proposal is to fill this gap in the literature by adjusting FPCA for covariate information, for both functional and longitudinal data. A recent FPCA approach developed by the PI and colleagues facilitates this extension, which is coupled with nonparametric and semiparametric approaches. A second theme of the proposal is to apply the functional methodology in Aim 1 to unstructured (non-functional) high dimensional data by reordering and then 'stringing' them into data which can then be interpreted as functional data. 'Stringing' can be accomplished through multidimensional scaling. If sufficiently strong correlations exist among the variables of the unstructured data, an ordering can be found in which neighboring variables are highly correlated. This approach turns the curse of high-dimensional data into a blessing, as functional data analysis inherently takes advantage of the adjacency of high dimensional densely recorded data for each subject. The overarching goal of this project is to develop a cohesive framework for several types of high dimensional data through a combination of FDA and Stringing approaches. The two themes are thus tightly connected and further explored in Aim 3, to develop and disseminate software for these methods.

High dimensional data are common nowadays in many disciplines. This proposal focuses on two types of high-dimensional data: functional/longitudinal data and unstructured high-dimensional data. The combined approaches of Aims 1 and 2 can handle a large variety of high-dimensional data. The proposed research is motivated by real world problems, such as those from the Baltimore Longitudinal Study or in gene expression studies listed in the databases of the National Center for Biotechnology Information. A significant portion of the problems originates from ongoing collaborations of the PI with biologists and demographers and aims to identify: (1) key biological and behavioral factors that contribute to longevity, (2) key risk factors to diseases, and (3) genes that are related to patient survivals. The new approaches will help to shed light on important issues in many fields by overcoming the challenges with high dimensional data.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0906813
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$399,627
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618