The analysis of high-dimensional data sets now commonly arising in scientific investigations poses many statistical challenges not present in smaller scale studies. Extracting information with precision from such data is becoming ever more important. This FRG proposal is the PIs' unified effort to respond to the pressing scientific needs. Specifically, The goals are to develop a comprehensive theoretical framework and general methodologies for estimating a large covariance matrix and its functionals and for functional data regression where the predictors and/or the responses involve functional measurements, and to address a wide range of important applications in biomedical studies.

The statistical and scientific objectives outlined in this proposal are at the intellectual center of a rapidly growing field in statistics and biostatistics. The new technical tools, inference procedures, and computing algorithms for analyzing high-dimensional data will greatly facilitate scientific investigations in a wide range of disciplines, These fields include astronomy, biology, chemistry, bioinformatics, and particularly in medicine. The proposed efficient analytical procedures hold great potential in deriving more accurate prediction rules for clinical outcomes based on new biological and genetic markers and thus may lead to a better understanding of disease processes. Research results from this proposal will be disseminated through the workshops and seminar series such that the methods would be publicly available to researchers in other disciplines. Software tools developed will be made freely and publicly available as open source code. The proposed project will also bring high-quality training to students and postdoctoral researchers.

Project Report

Covariance and precision matrices play a central role in multivariate statistical analysis. A wide range of statistical methodologies, including clustering analysis, principal component analysis,linear and quadratic discriminant analysis, Gaussian graphical models, regression analysis, require the knowledge of the covariance or precision structure. The standard covariance matrixestimator has been used frequently in practice when analyzing high dimensional data, which may result in poor performance and invalid conclusions. The last decade had witnessed a tremendous advance in understanding of high dimensional structured covariance and precision matrices estimation. We proposed novel regularized procedures for a wide range of matrix structures and carried out fundamental studies to understand optimalities in various estimation settings. We have observed applications in many fields including analysis of gene expression arrays, astrophysics, climate studies, functional magnetic resonance imaging, risk management and portfolio allocation. To overcome the difficulty associated with the high dimensionality, various covariance and precision matrix structures are assumed, such as Toeplitz or bandable covariance, sparse covariance, factor model, sparse principal components analysis, as well as sparse precision matrix, and latent graphical model. We obatined various optimization results in all those settings. The wide application of high dimensional inference ensures that the progress we made will have a great impact in the broad scientific community. It provides theory, methodology, applications, as well as technical tools to researchers in other scientific elds who collect and analyze high dimensional data sets. These include astronomy, biology,chemistry, bioinformatics, as well as various clinical research areas in general. Research results had been disseminated through the workshops and seminar series such that the methods are publicly available to researchers in other disciplines. To carry out the project we brought high-quality training to students and postdoctoral researchers, as well as conduct outreaching programs, particularly in attracting and nurturing female and minority students to Applied Mathematics, Statistics and Biostatistics.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0854975
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-08-01
Budget End
2013-07-31
Support Year
Fiscal Year
2008
Total Cost
$330,000
Indirect Cost
Name
Yale University
Department
Type
DUNS #
City
New Haven
State
CT
Country
United States
Zip Code
06520