Medical and biological data often come in the form of digitized signals and images; for example, mass spectrograms, electrocardiogram traces, human gait cycles, and even the representation of gene expression arrays. As instrumental data acquisition becomes routine, sequences of such images, signals or paths are collected, often along with other covariate measurements, resulting in datasets where the basic unit of measurement, or response, is a very high-dimensional object. The gene microarray is a leading example of how new technology has led to data acquisition on a massive scale; we also expect to work with more direct protein measurements obtained through mass spectrometry. The project continues to focus on developing techniques for modeling and understanding such data that naturally adapt to the high dimensionality. For regression and classification with gene expression arrays, we consider methods that are a subtle blend between univariate and multivariate, that offer both good prediction and gene selection. To study covariance structure, the project continues to develop """"""""sparse"""""""" forms of principal components and discriminant analysis that may be more sensitive to either local phenomena of not necessarily smooth form or that are more adapted to irregularly observed data. ? ? Corresponding quadratically regularized methods in appropriate bases form a natural foil for comparison, and inference procedures for some of these are proposed. For estimation of means, the project will examine sparse empirical Bayes and False Discovery Rate methods for estimating non smooth local phenomena. Much of this work will be carried out in existing and new collaborations with researchers in oncology, genetics, cardiology and other specialties, working for example on cancer, heart disease and human locomotion. ? ?

Agency
National Institute of Health (NIH)
Institute
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Type
Research Project (R01)
Project #
9R01EB001988-08
Application #
6687387
Study Section
Social Sciences, Nursing, Epidemiology and Methods 4 (SNEM)
Program Officer
Pastel, Mary
Project Start
1996-09-10
Project End
2007-06-30
Budget Start
2003-07-01
Budget End
2004-06-30
Support Year
8
Fiscal Year
2003
Total Cost
$367,559
Indirect Cost
Name
Stanford University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94305
Johnstone, Iain M (2018) Tail sums of Wishart and Gaussian eigenvalues beyond the bulk edge. Aust N Z J Stat 60:65-74
Johnstone, Iain M; Paul, Debashis (2018) PCA in High Dimensions: An orientation. Proc IEEE Inst Electr Electron Eng 106:1277-1292
Reid, Stephen; Newman, Aaron M; Diehn, Maximilian et al. (2018) Genomic Feature Selection by Coverage Design Optimization. J Appl Stat 45:2658-2676
Powers, Scott; Qian, Junyang; Jung, Kenneth et al. (2018) Some methods for heterogeneous treatment effect estimation in high dimensions. Stat Med 37:1767-1787
Taylor, Jonathan; Tibshirani, Robert (2018) Post-Selection Inference for ?1-Penalized Likelihood Models. Can J Stat 46:41-61
Donoho, David L; Gavish, Matan; Johnstone, Iain M (2018) Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model. Ann Stat 46:1742-1778
Pataki, Camille I; Rodrigues, João; Zhang, Lichao et al. (2018) Proteomic analysis of monolayer-integrated proteins on lipid droplets identifies amphipathic interfacial ?-helical membrane anchors. Proc Natl Acad Sci U S A 115:E8172-E8180
Groll, Andreas; Hastie, Trevor; Tutz, Gerhard (2017) Selection of effects in Cox frailty models by regularization methods. Biometrics 73:846-856
Johnstone, I M; Nadler, B (2017) Roy's largest root test under rank-one alternatives. Biometrika 104:181-193
Reid, Stephen; Tibshirani, Robert (2016) Sparse regression and marginal testing using cluster prototypes. Biostatistics 17:364-76

Showing the most recent 10 out of 61 publications