The analysis of high-dimensional data sets now commonly arising in scientific investigations poses many statistical challenges not present in smaller scale studies. Extracting information with precision from such data is becoming ever more important. This FRG proposal is the PIs' unified effort to respond to the pressing scientific needs. Specifically, The goals are to develop a comprehensive theoretical framework and general methodologies for estimating a large covariance matrix and its functionals and for functional data regression where the predictors and/or the responses involve functional measurements, and to address a wide range of important applications in biomedical studies.
The statistical and scientific objectives outlined in this proposal are at the intellectual center of a rapidly growing field in statistics and biostatistics. The new technical tools, inference procedures, and computing algorithms for analyzing high-dimensional data will greatly facilitate scientific investigations in a wide range of disciplines, These fields include astronomy, biology, chemistry, bioinformatics, and particularly in medicine. The proposed efficient analytical procedures hold great potential in deriving more accurate prediction rules for clinical outcomes based on new biological and genetic markers and thus may lead to a better understanding of disease processes. Research results from this proposal will be disseminated through the workshops and seminar series such that the methods would be publicly available to researchers in other disciplines. Software tools developed will be made freely and publicly available as open source code. The proposed project will also bring high-quality training to students and postdoctoral researchers.
The research developed in this proposal aims to significantly advance the theoretical understanding and methodological development of high-dimensional statistical inference and the novel application of these methods to clinical studies. The three co-PIs made substantial progress with respect to the proposal including theoretical and methodogical developments related to large scale covariance matrix estimation/test, functional data analysis, prediction with high dimensional covariates as well as a wide range of clinical questions related to high dimensional inference that arise in medical research. Specifically, for clinical applications, we have successfully developed statistical algorithms that can (i) identify the set of markers that are associated with various types of disease outcomes and (ii) provide more precise prediction of an individual's risk of disease by efficiently utilizing high dimensional predictors such as genomic features. These methods can be used derive accurate risk prediction rules and hence maybe ultimately lead to better disease monitoring strategies. For example, high-risk patients can be recommended for more intensive monitoring and additional testing while low-risk patients can be managed with more cost effective measures. These methods have also been successfully applied to electronic medical record studies to identify patients with various diseases. An important aspect of this interdisciplinary project is the transferability of the proposed methods across disciplines. To encourage active interaction between our team and those working on high dimensional data analyses in other disciplines, we have organized interdisciplinary seminar series at Penn, Yale, and Harvard to bring together researchers from around the country interested in this emerging and exciting research area. In addition to the seminar series, we also organized two high dimensional data workshops that brought together experts around the world to update each other with most recent results achieved by researchers from various fields.