As technology advances, scientists are challenged by more and more high-dimensional and complex data. For example, genetic data from microarray experiments are very large in size and new techniques are needed to identify specific genes for various diseases. Longitudinal data on various variables over time on millions of individuals produce interesting challenges. Such data call for new statistical techniques. Some of the challenges in high-dimensional data include variable selection from a large group of variables. Some of the existing methods also suffer from a high false discovery rate. In addition, in quantile regression methods, an odd phenomenon of quantile crossing needs to be addressed. Finally, spatial and longitudinal studies require special efficient methods for estimating the covariance patterns. Motivated by different features of high-dimensional or complex data, the PI develops several methods. In this grant application, the PI proposes: 1. new techniques for variable selection for high-dimensional data and new methods to reduce the false discovery rate; 2. new techniques to handle the phenomenon of quantile crossing with application to class probability estimation; 3. new methods to estimate covariance structure for spatial data and longitudinal data; and 4. parametrically guided nonparametric estimation for the quasi-likelihood method. The proposed methods will be studied theoretically for their asymptotic behavior and compared with some of the existing methods both theoretically and through simulations.

High-dimensional variable selection techniques are called for by many scientists to efficiently analyze large-scale complex financial, environmental, and biomedical data such as gene expression, proteomics and metabolomics, or brain imaging data. These types of data require techniques to identify important features. To achieve this goal, the PI proposes a screening method to select appropriate statistical models. This screening method can be applied to biomedical data to locate important genes responsible for diseases of interest such as breast cancer and leukemia. Spatial and longitudinal data are sparsely and irregularly observed also in environmental and clinical studies. For such data, many efforts have been devoted to studying their covariance structures. In this proposal, the PI proposes a flexible convolution-based method to estimate covariance structures nonparametrically. This method can be applied to many environmental data such as precipitation and wind to improve our understanding of environmental changes including the well-known "climate change" issue. This research has many societal applications. In addition, the PI takes advantage of the mentoring program in the department to work with US doctoral students, especially women and minorities. The PI also works with undergraduates from NSF-CSUMS program in the department as it is important to train computationally strong critical thinkers for the future.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0905561
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-07-01
Budget End
2013-06-30
Support Year
Fiscal Year
2009
Total Cost
$120,000
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695