Fence Methods for Mixed Model Selection:  Theory and Applications

Rao, J; Jiang, Jiming

Abstract

This project aims to develop a new class of model selection strategies, known as fence methods. The general idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model will be selected among the correct models (those within the fence) according to, e.g., simplicity of the models. The last step of the procedure, i.e., the selection of the optimal model within the fence, can be made exible to take scientific or economical considerations into account. The PIs have developed this concept within the context of mixed model selection which includes among other things, linear mixed models and generalized linear mixed models with clustered or non-clustered data. This project aims to: 1) Develop new fence methodology for the problem of gene set analysis from gene expression (microarray) studies. These gene sets represent apriori groupings of genes whose activity is thought to be related (often via biological pathways). Thus it is of interest of know if these groups are perturbed with respect to changing conditions like worsening of disease (in our case, worsening of colon cancer). Knowledge of this would provide insight into which pathways seem to be implicated in poor outcome versus better outcomes, thereby providing potentially novel bio- logical targets for diagnostics or therapeutics. Fence methods for gene set analysis provide a potentially rich class of approaches for tackling such a task.
Aim 1 will develop in detail the theory and optimality of such approaches and then provide comprehensive comparisons to existing methods. The newly developed methods will then be applied to a large repository of colon cancer microarray data which represents the various stages of the disease. Working closely with a biological collaborator, implicated pathways found by the fence will be validated and unravelled biologically. 2) Develop new fence methodology for the problem of analyzing large scale health survey data with the problem of small area estimation in mind. In this case, fence methods will be developed along two tracks - the rst involves allowing a richer class of non-parametric small area estimation mixed models to be used where the degree of smoothing for the xed eects part of the model can be assessed by appropriate fence approaches, and the second involves developing a fence approach that allows one to choose amongst competing small area models based upon prediction quality of small area random effects. In both situations, theory for the fence methods will be developed and the area of application will be a large health care survey collected at NIH. 3) Extend fence methods. Extensions will include new computational approaches known as grating, and also new ways of implementing the fence for association studies with applications to large case-control SNP association studies. Again, detailed theory will be developed and applications undertaken with appropriate collaborators. 4) Develop freeware software to implement the fence methods that will be developed in this project. This software will be written in the statistical package R which will allow users to integrate with other software continually being developed around the world.

Public Health Relevance

Correlated data is widely collected in all of the medical sciences from imaging data to longitudinal clinical trial data to family-based genetic data - all in an effort to better understand the underlying determinants of disease. Mixed models have provided a rich framework to model such data and make best use of the various kinds of structure that naturally are present. However, selecting from a set of competing mixed models has proven to be much more elusive of a problem with little guidance provided from the literature. The PIs of this proposal building on their recent successes in the area, oer a new elegant way to tackle this problem for complex data problems, and will rigorously study their proposed methods statistically, as well as through a variety of interesting applications via collaborations with prominent laboratories at their home institutions and outside. These applications include gene set analysis from gene expression (microarray) studies, association analysis from high throughput SNP studies, and small area estimation from large health survey data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM085205-06
Application #: 8643252
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Brazhnik, Paul

Project Start: 2010-04-01
Project End: 2015-03-31
Budget Start: 2014-04-01
Budget End: 2015-03-31
Support Year: 6
Fiscal Year: 2014
Total Cost: $259,778
Indirect Cost: $44,327

Institution

Name: University of Miami School of Medicine
Department: Public Health & Prev Medicine
Type: Schools of Medicine
DUNS #: 052780918

City: Coral Gables
State: FL
Country: United States
Zip Code: 33146

Related projects


NIH 2014 R01 GM	Fence Methods for Mixed Model Selection: Theory and Applications Rao, J Sunil; Jiang, Jiming / University of Miami School of Medicine	$259,778
NIH 2013 R01 GM	Fence Methods for Mixed Model Selection: Theory and Applications Rao, J Sunil; Jiang, Jiming / University of Miami School of Medicine	$252,593
NIH 2012 R01 GM	Fence Methods for Mixed Model Selection: Theory and Applications Rao, J Sunil; Jiang, Jiming / University of Miami School of Medicine	$271,551
NIH 2011 R01 GM	Fence Methods for Mixed Model Selection: Theory and Applications Rao, J Sunil; Jiang, Jiming / University of Miami School of Medicine	$272,933
NIH 2010 R01 GM	Fence Methods for Mixed Model Selection: Theory and Applications Rao, J Sunil; Jiang, Jiming / University of Miami School of Medicine	$297,671

Publications

Jiang, Jiming; Nguyen, Thuan; Rao, J Sunil (2015) The E-MS Algorithm: Model Selection with Incomplete Data. J Am Stat Assoc 110:1136-1147

Nguyen, Thuan; Peng, Jie; Jiang, Jiming (2014) Fence Methods for Backcross Experiments. J Stat Comput Simul 84:644-662

Lin, Bingqing; Pang, Zhen; Jiang, Jiming (2013) Fixed and Random Effects Selection by REML and Pathwise Coordinate Optimization. J Comput Graph Stat 22:341-355

Dazard, Jean-Eudes; Rao, J Sunil (2012) Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data. Comput Stat Data Anal 56:2317-2333

Dazard, Jean-Eudes; Xu, Hua; Rao, J Sunil (2011) R package MVR for Joint Adaptive Mean-Variance Regularization and Variance Stabilization. Proc Am Stat Assoc 2011:3849-3863

Nguyen, Thuan; Jiang, Jiming (2011) Simple estimation of hidden correlation in repeated measures. Stat Med 30:3403-15

Dazard, Jean-Eudes; Rao, J Sunil (2010) Local Sparse Bump Hunting. J Comput Graph Stat 19:900-929

Dazard, Jean-Eudes; Rao, J Sunil (2010) Regularized Variance Estimation and Variance Stabilization of High Dimensional Data. Proc Am Stat Assoc 2010:5295-5309

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: