This research project provides simultaneous confidence regions for various functional features in functional data analysis (FDA), with asymptotic theory and guide to practical implementation. Specifically, asymptotically correct confidence regions will be constructed for (1) the mean function of functional data and the coefficient function in varying coefficient longitudinal regression model; and (2) the covariance function of functional data and the regression function in functional linear model. For the simpler functions in (1), the investigator will employ both regression spline and local polynomial methods in order to establish rigorous asymptotic theory for both sparse and dense function data. Results on partial sum strong approximation by Brownian motions and advanced extreme value theory for sequences of non-stationary Gaussian processes will be applied to obtain distributional properties of the maximal deviation processes. For the more complicated functions in (2), the investigator will propose two-step estimators and show that it is asymptotically as efficient as some ?infeasible? analogs. Asymptotic distributions for maximal deviations are established for the ?infeasible estimators? which are then inherited by the two-step estimators.

Functional data, also known as curve data, consist of collections of digitally recorded curves or surfaces, often with random errors. Such data abound in virtually all scientific disciplines, including but not limited to, climatology, clinical studies, epidemiology, evolutionary biology and food engineering/science. The need to draw information out of a sample of curves, coupled with the unleashing of modern computing power, has made functional data analysis (FDA) one of the most active areas of contemporary statistics research. While multivariate statistics is about unknown vectors and matrices, FDA concerns unknown curves and surfaces, which is most naturally done with confidence regions. The methods developed by the investigator fill a major gap in the current FDA methodology, which lacks procedures to make conclusions on an entire curve with quantifiable uncertainty. Codes written in common software packages such as Matlab or R will be freely distributed so practitioners from academia and industry for analyzing functional data in real time, with own chosen significance levels. Completing this project depends crucially on several capable Ph. D. students working under the investigator?s supervision, so state-of-the-art research is integrated with the training of graduate students as future researchers, consistent with NSF's education goal.

Project Report

Functional data, also figuratively called curve data, consist of random collections of digitally recorded sample curves or surfaces, often contaminated with measurement errors. Since 2000, the study of functional data has been a focal point in main stream statistics research, as such data pour in from virtually all scientific disciplines, including but not limited to, climatology, clinical studies, epidemiology, evolutionary biology and food engineering/science. While there has been a massive amount of research in functional data analysis (FDA), a mere fraction of it addresses the critical issue of statistical inference, namely, drawing conclusions about an entire curve or surface of interest with quantifiable uncertainty. While classic mathematical statistics provides data analysts with confidence intervals for single parameters and joint confidence regions for multiple parameters, analogous constructs in the context of FDA almost did not exist prior to this project. The most natural tools for drawing intelligent conclusions on unknown curves/surfaces are confidence bands/envelopes, which are simply two/three dimensional regions enclosed by an upper confdence curve/surface and a lower one, both constructed from the data. At the completion of this project, several types of simultaneous confidence bands have been made available for the mean of functional data of both sparse and dense type (i.e., each sample curve may have been recorded over a small or large number of points). Simultaneous confidence envelope has also been provided for the covariance surface of functional data which is not affected by the mean function. Codes written in the popular software package R to compute confidence band for sparse functional data will be available on the internet so practitioners from academia and industry can use it freely for analyzing functional data in real time, with own chosen significance levels. Three Ph. D. students had worked on the project and had been trained to capable researchers in FDA, consistent with NSF's education goal.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1007594
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2010-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2010
Total Cost
$159,986
Indirect Cost
Name
Michigan State University
Department
Type
DUNS #
City
East Lansing
State
MI
Country
United States
Zip Code
48824