Evaluation of Sample Sizes Used to Train Classifiers and Prognostic Predictors

Dobbin, Kevin; Ahn, Jeongyoun

Abstract

The overall goal of this project is to produce methods that will improve the development of models for cancer prognosis and diagnosis. These improvements may expedite the translation of novel technologies towards clinically useful tools. Recent years have seen the development of many biological assays that measure hundreds or thousands of analytes in parallel. Examples include gene expression microarrays, microRNA assays, sequencing assays and SNP chips. Two common objectives of these studies are 1) to develop prognostic predictors of cancer patient survival or recurrence outcome, and 2) to develop classifiers that may be useful in patient treatment selection. Development of a prognostic predictor or classifier requires a training set, which is a collection of samples used to formulate the prognostic prediction or classification rule. This R21 project will develop methods for establishing the sample size required to train prognostic predictors and classifiers in high dimensional settings. Critical to evaluation of the methods will be assessment of the training performance on large datasets. The methods will be validated on microarray datasets because this high dimensional technology is relatively well-studied and there are publicly available cancer microarray datasets with required clinical data.
The specific aims of this proposal are therefore to 1) develop novel methods for sample size estimation in high dimensional training studies, 2) develop novel methods for removing batch effects from high dimensional datasets, 3) validate the training sample size methodology on large agglomerated datasets that used the same microarray platform and studied similar patient populations. Long term objective: It is foreseen that this R21 will develop into a suite of sample size methods for the design of studies to train high and medium dimensional classifiers and prognostic predictors. While the application in this R21 focuses on microarray data, expansion of the sample size and batch effect elimination methods to other technologies is foreseen as an important future direction of this research.

Public Health Relevance

Cancer """"""""signatures"""""""" developed from high dimensional data, such as microarrays and single nucleotide polymorphism (SNP) arrays, hold the promise of making cancer treatments more personalized to the individual patient. This proposal will develop innovative statistical methods for determining how many tumor samples are required to identify a """"""""signature."""""""" The new sample size methods will be validated by combining together high dimensional cancer patient data from existing data warehouses.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21CA152460-01A1
Application #: 8113682
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Verma, Mukesh

Project Start: 2011-08-04
Project End: 2013-07-31
Budget Start: 2011-08-04
Budget End: 2012-07-31
Support Year: 1
Fiscal Year: 2011
Total Cost: $161,494
Indirect Cost

Institution

Name: University of Georgia
Department: Public Health & Prev Medicine
Type: Schools of Public Health
DUNS #: 004315578

City: Athens
State: GA
Country: United States
Zip Code: 30602

Related projects


NIH 2012 R21 CA	Evaluation of Sample Sizes Used to Train Classifiers and Prognostic Predictors Dobbin, Kevin K.; Ahn, Jeongyoun / University of Georgia	$161,494
NIH 2011 R21 CA	Evaluation of Sample Sizes Used to Train Classifiers and Prognostic Predictors Dobbin, Kevin K.; Ahn, Jeongyoun / University of Georgia	$161,494

Publications

Lee, Jung Ae; Dobbin, Kevin K; Ahn, Jeongyoun (2014) Covariance adjustment for batch effect in gene expression data. Stat Med 33:2681-95

Song, Xiao; Wang, Ching-Yun (2014) Proportional Hazards Model with Covariate Measurement Error and Instrumental Variables. J Am Stat Assoc 109:1636-1646

Dobbin, Kevin K; Song, Xiao (2013) Sample size requirements for training high-dimensional risk predictors. Biostatistics 14:639-52

Song, Xiao; Zhou, Xiao-Hua; Ma, Shuangge (2012) Nonparametric receiver operating characteristic-based evaluation for survival outcomes. Stat Med 31:2660-75

Comments

Be the first to comment on Kevin Dobbin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: