Consistent model selection in the p>>n setting

Johnson, Valen

Abstract

Among the most fundamental and commonly encountered statistical problems in medical research is the problem of model selection. Model selection is the process by which researchers identify the relationships between measured quantities;thus it plays a central role in the analysis of essentially all high-throughput screening data. Model selection procedures represent the primary analytical mechanism through which the associations between diseases and large numbers of biochemical, genetic and pharmacological variables are discovered. The fundamental hypothesis tested in this application is that a new class of model selection procedures can be used to effectively identify associations between biological variables and disease outcomes, even in settings where there are many more potential biological correlates than there are observations on each variable. The goals of this project are to develop these variable selection procedures so that they can be applied to high-throughput screening data, and to apply the resulting methodology in three important application areas. To achieve these goals, the following specific aims will be addressed. Known theoretical properties of the proposed model selection procedures will be extended to cases in which there are many more biological measurements available than there are observations on each measurement (i.e., p n setting). Constraints on the number of variables that can be included in final models for outcome variables will be determined, and efficient numerical algorithms will be developed so that these methods can be applied to actual high-throughput screening data. The new model selection procedures will be used to define binary classification algorithms that can predict clinical outcomes from high-dimensional gene expression data sets. The new model selection procedures will be used to identify and analyze interactions between genes that are associated with cancer and other diseases in genome-wide association studies using single-nucleotide polymorphism data. The new model selection procedures will be used to analyze biological pathways as informed by high- throughput molecular interrogation data. The algorithms developed during this project constitute a major innovation in the field of model selection and will provide medical researchers with a new and unique set of tools for effectively identifying biological associations among biomarkers, disease attributes, and patient outcomes from high-throughput screening data.

Public Health Relevance

Model selection procedures are statistical techniques that allow researchers to discover the associations between disease and the large number of variables that are measured in emerging high-throughput screening technologies. For example, model selection techniques are used to discover which genes are associated with particular forms of cancer. This project proposes a new class of model selection procedures that will make it easier for researchers to discover such associations.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 1R01CA158113-01
Application #: 8084684
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Dunn, Michelle C

Project Start: 2011-04-01
Project End: 2015-03-31
Budget Start: 2011-04-01
Budget End: 2012-03-31
Support Year: 1
Fiscal Year: 2011
Total Cost: $310,258
Indirect Cost

Institution

Name: University of Texas MD Anderson Cancer Center
Department: Biostatistics & Other Math Sci
Type: Other Domestic Higher Education
DUNS #: 800772139

City: Houston
State: TX
Country: United States
Zip Code: 77030

Related projects


NIH 2020 R01 CA	Consistent variable selection in p>>n settings Johnson, Valen Earl / Texas A&M University
NIH 2019 R01 CA	Consistent variable selection in p>>n settings Johnson, Valen Earl / Texas A&M University
NIH 2018 R01 CA	Consistent variable selection in p>>n settings Johnson, Valen Earl / Texas A&M University
NIH 2017 R01 CA	Consistent variable selection in p>>n settings Johnson, Valen Earl / Texas A&M University
NIH 2016 R01 CA	Consistent variable selection in p>>n settings Johnson, Valen Earl / Texas A&M University	$333,201
NIH 2014 R01 CA	Consistent model selection in the p>>n setting Johnson, Valen Earl / Texas A&M University	$290,219
NIH 2013 R01 CA	Consistent model selection in the p>>n setting Johnson, Valen Earl / Texas A&M University	$280,925
NIH 2012 R01 CA	Consistent model selection in the p>>n setting Johnson, Valen Earl / University of Texas MD Anderson Cancer Center	$291,267
NIH 2011 R01 CA	Consistent model selection in the p>>n setting Johnson, Valen Earl / University of Texas MD Anderson Cancer Center	$310,258

Publications

Shin, Minsuk; Bhattacharya, Anirban; Johnson, Valen E (2018) Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings. Stat Sin 28:1053-1078

Rossell, David; Telesca, Donatello (2017) NON-LOCAL PRIORS FOR HIGH-DIMENSIONAL ESTIMATION. J Am Stat Assoc 112:254-265

Papaspiliopoulos, O; Rossell, D (2017) Bayesian block-diagonal variable selection and model averaging. Biometrika 104:343-359

Johnson, Valen E; Payne, Richard D; Wang, Tianying et al. (2017) On the Reproducibility of Psychological Science. J Am Stat Assoc 112:1-10

Liu, Suyu; Johnson, Valen E (2016) A robust Bayesian dose-finding design for phase I/II clinical trials. Biostatistics 17:249-63

Nikooienejad, Amir; Wang, Wenyi; Johnson, Valen E (2016) Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors. Bioinformatics 32:1338-45

Wang, Yuan; Hobbs, Brian P; Hu, Jianhua et al. (2015) Predictive classification of correlated targets with application to detection of metastatic cancer using functional CT imaging. Biometrics 71:792-802

Hu, Jianhua; Zhu, Hongjian; Hu, Feifang (2015) A Unified Family of Covariate-Adjusted Response-Adaptive Designs Based on Efficiency and Ethics. J Am Stat Assoc 110:357-367

Yajima, Masanao; Telesca, Donatello; Ji, Yuan et al. (2015) Detecting differential patterns of interaction in molecular pathways. Biostatistics 16:240-51

Rossell, David (2015) BIG DATA AND STATISTICS: A STATISTICIAN'S PERSPECTIVE. Metode Sci Stud J 5:143-149

Showing the most recent 10 out of 23 publications

Comments

Be the first to comment on Valen Johnson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: