This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5). The objective of this research is to develop novel/improved methods for statistical model building and risk factor estimation when the training set has complex continuous and discrete attribute variables and including, but not limited to extremely long attribute vectors of which only a small set of interacting variables are believed to be relevant, but it is not known which. In addition, noisy relationship or dissimilarity information may be known between sufficiently many pairs of training set members, along with multiple correlated Bernoulli outcomes. In this context the objective is to model relations between the attribute and dissimilarity information and the multiple Bernoulli outcomes, as well as their correlations and conditional relationships. In particular the work will be concerned with issues related to developing tuning procedures for the multiple correlated Bernoulli case with heterogenous input data and penalty functionals, variable selection problems in this complex setup and the development of novel/improved computational tools to handle large data sets with complex optimization criteria.

With the availability of huge amounts of data and high speed computing, modern statistical model building and data mining tools are doing impressive things to extract information in just about every scientific field from Astronomy to Zoology, not to mention problems in marketing, government, defense, health and the economy. Data sets in many areas of interest contain deeply embedded information relating to the risks of various outcomes, given complex inputs or observables. However the generation of huge, complex data sets in many fields of endeavor is beginning to outstrip the tools available for analyzing them. A new paradigm is proposed for development, which builds on previous results and which has as core the ability to deal with complex heterogenous data structures, to understand relationships between complex multivariate input information and complex multivariate correlated responses. The proposed work, when completed and disseminated, will provide a set of important and useful tools for new/improved statistical model building/data mining relating observational data in complex input structures to complex multiple correlated outcomes.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0906818
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-07-01
Budget End
2013-09-30
Support Year
Fiscal Year
2009
Total Cost
$582,405
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715