Proposal Number: DMS 9803273 PI: Minge Xie Institution: Rutgers University Project: Messy Data Modeling and Related Topics Abstract: The main objective of this research is to investigate a number of problems arising from dealing with messy data sets that occurred in many disciplines of science. The data studied in this research violate conventional assumptions, such as independence, homogeneity, among others, which are otherwise adopted under more standard settings. Six specific topics are presented; each corresponds to at least one violation of conventional assumptions, and a messy data set may have one or more of these types of violations. According to their origins from two different motivating data sets, two of the six topics address problems on group testing scheme (Dorfman, 1943) and its variants, including issues on modeling false negatives and relaxing the assumption of independence on individuals. The rest of the topics investigate practical and theoretical issues related to modeling batch correlated regression data and develop new models and methods for heterogeneous observations. These developments will not only solve the specific type of problems, but also stimulate new researches to develop more general methodologies. This research develops statistical methodologies, models, and related theories to address issues arising from modeling and analysis of messy data, which can be found in many disciplines of the sciences, including life science, environmental science, social science, industry and economics. A common feature of these messy data is that they all violate some conventional model assumptions, which otherwise are adopted under more standard settings. Accurate modeling of messy data can eliminate irrelevant information and provide better understanding of underlying mechanisms; ultimately benefiting prediction and decision making. Although tremendous progress has been made in development of both sophisticated statistical methodologies and elegant mathematical theories in the past half century, many important problems in modeling and analysis of messy data have yet to be tackled, both from practical and theoretical viewpoints. This research investigates several such problems. Although the models and methodologies are tailored to specific problems in the Pharmaceutical industry and environmental science, these developments will not only solve the specific type of problems, but also stimulate new researches to develop more general methodologies.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
9803273
Program Officer
John Stufken
Project Start
Project End
Budget Start
1998-07-15
Budget End
2002-06-30
Support Year
Fiscal Year
1998
Total Cost
$46,786
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901