Adaptation of New Statistical Ideas for Medicine

Efron, Bradley

Abstract

Our MERIT award work will continue to have two main components: involvement in .specific biomedical reseai-ch projects sucli as NHBLI's FEHGAS study, and development of new statistical methods appropriate for the analysis of large, complex data sets. These efforts are complementary, with the speciflc projects ?suggesting which statistical rnethods are mofit needed, and also serving as test cases for new methodology. The FEHGAS study, for exarhple,- seeks to predict age of onset of hypertiension from SNP data (and background variables such as age and gender). There are 550,000 SNPs available for prediction, most of which will turn out to be useless, making the problem an ijrder of magnitude more challenging, than in expression microarray situations. Efron plans to extend the empirical Bayes liiethodology from his recent paper to this context, hopefully overcoming the difficulties caused by the usually weak predictive power of individual SNPs. Olshen plans to extend CART (Computer Assisted Regre.s.sion Trees) and bootstrap methodology to the selection of groups of promising predictive SNPs. Large-scale significance testing, for instance selecting 'significant'genes in a microarray cancer study, has become an area of iiitense statistical development. Nevertheless, crucial questions of appropriate implementation remain vague in the literature: the choice of an appropriate null hypothesis;the selection of a comparison set (Should all 550,000 SNPs be tested together or sepai-ately by chromosome?);and the effects of correlation. We have made some headway in answering thescf questions, as described in the Progress Report. Our continuing efforts are a combination of methodological implementation and theoretical development. Correlatiion can have particularly dra.stic effects on staiidard statistical techniques. Iii """"""""Are a .set of microarrays independent of each other?"""""""" it is shovyn that a study involving 20,000 genes has its effective sample size reduced to about 17 because of severe gene-wise correlation. We are currently developing diagnostic methods to spot correlation difficulties in massive data sets, and to assess their effects on hypothesis tests, estimates, and predictions. A 20,000 gene microarray study produces 200,000,000 correlations, which sounds oppressively large for practical insight. But we are making progress on an empirical Ba5'es approximation that summarizes correlation, effects in a single number, suitable for simple analysis. Twentieth Centiiry biostatistical applications were overwhelmingly frequentist in nature. Pure: frequentism, though, becomfSi impra<;tical for analyzing the large, complex data sets produced by modem biomedical devices, where the relationships of thousands of parameters and millions of data points have to be considered together. We are continuing to develop empirical Bayes methods that allow Bayesian ideas to be brought to bear on questions of multiple inference, without requiring specific prior distributions from the .scientist. A long-term project is to understand how quickly empirical Bayes information accrues in a medical study. A False Discovery Rate is an estimate of the Bayes posterior probabiUty that a gene (or a SNP, br a voxel) is 'null', given the observed data. How many subjects and how many genes do we need to observe in order to get an acciurate empirical Bayes estiinate of the posterior probability? hi our own version of Moore's law, biomedical data sets have increased an order of magnitude in size every few years since the 1990s. Emerging technologies (tiling arrays, bead arrays, aptamer chips, methylation arrays, exon chips, and a variety of new imaging devices) promise further increases, taxing both computational equipment and statistical inethodology. Our long-term MERIT goal is to provide algorithms and theory appropriate tp massive-data biomedical requirements.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Type: Method to Extend Research in Time (MERIT) Award (R37)
Project #: 5R37EB002784-37
Application #: 8215793
Study Section: Special Emphasis Panel (NSS)
Program Officer: Peng, Grace

Project Start: 1993-01-15
Project End: 2015-01-31
Budget Start: 2012-02-01
Budget End: 2013-01-31
Support Year: 37
Fiscal Year: 2012
Total Cost: $204,890
Indirect Cost: $75,275

Institution

Name: Stanford University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 009214214

City: Stanford
State: CA
Country: United States
Zip Code: 94305

Related projects


NIH 2014 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$373,909
NIH 2013 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$190,707
NIH 2012 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$204,890
NIH 2011 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$205,404
NIH 2010 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$224,250
NIH 2009 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$209,400
NIH 2008 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$210,285
NIH 2007 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$216,479
NIH 2006 R37 EB	Adaptation of New Statistical Ideas for Medicine Efron, Bradley / Stanford University	$217,732

Publications

Efron, Bradley (2015) Frequentist accuracy of Bayesian estimates. J R Stat Soc Series B Stat Methodol 77:617-646

Efron, Bradley (2014) Estimation and Accuracy after Model Selection. J Am Stat Assoc 109:991-1007

Yoon, Sangho; Assimes, Themistocles L; Quertermous, Thomas et al. (2014) Insulin resistance: regression and clustering. PLoS One 9:e94129

Efron, Bradley (2014) Two modeling strategies for empirical Bayes estimation. Stat Sci 29:285-301

Won, Joong-Ho; Lim, Johan; Kim, Seung-Jean et al. (2013) Condition Number Regularized Covariance Estimation. J R Stat Soc Series B Stat Methodol 75:427-450

Olshen, Adam B; Hsieh, Andrew C; Stumpf, Craig R et al. (2013) Assessing gene-level translational control from ribosome profiling. Bioinformatics 29:2995-3002

Efron, Bradley (2013) Mathematics. Bayes' theorem in the 21st century. Science 340:1177-8

Smith, Roger S; Efron, Bradley; Mah, Cheri D et al. (2013) The impact of circadian misalignment on athletic performance in professional football players. Sleep 36:1999-2001

Bavinger, Clay; Bendavid, Eran; Niehaus, Katherine et al. (2013) Risk of cardiovascular disease from antiretroviral therapy for HIV: a systematic review. PLoS One 8:e59551

Won, Joong-Ho; Jeon, Yongkweon; Rosenberg, Jarrett K et al. (2013) Uncluttered Single-Image Visualization of Vascular Structures Using GPU and Integer Programming. IEEE Trans Vis Comput Graph 19:81-93

Showing the most recent 10 out of 24 publications

Comments

Be the first to comment on Bradley Efron's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: