Medical and biological data often come in the form of signals, including sequences, and images. In the biomedical setting, microarrays, high-throughput sequencing, protein arrays and many other assays are in widespread use. Similarly, electromagnetic brain imaging techniques (MRI, fMRI and EEG/MEG) are used to study cortical activity in the brain and anatomy. The nature of these data brings major challenges for statistical analysis: specifically the number of measurements is often much larger than the number of cases, and there are correlations among the components. The broad aim of this ongoing three-investigator grant is to develop and study statistical techniques that enhance the analysis and interpretation of these data. Our focus in the new projects is the development of models and methods to extract maximal information from these emerging technologies, and as statisticians, to guide the scientist in valid interpretation of the results. The renewal will address these goal through four Specific Aims. The investigators will study: 1. Post-Selection Inference for comparing internal to external predictors. For genomic and other -omic data, valid statistical comparison of empirical biomarker signatures to standard clinical predictors such as height, weight, and age, using new tools from post-selection inference; 2. Statistical Methods for cancer detection via CAPP-seq. Statistical and computational approaches for determining which contiguous regions (tiles) of the genome should be sequenced, in the search for cancer mutations directed toward earlier cancer detection; 3. New settings for high dimensional Eigen structure in virology and genetics. Eigenvector estimation methods for vaccine design in virology based on mutation sequence data; statistical tools for understanding the distribution of the eigenvalues of large variance component matrices in quantitative genetics by adapting recent advances in statistical random matrix theory; 4. Locally smooth models for MRI data. Improving the sensitivity and resolution of quantitative and diffusion MRI by using models that exploit the spatial structure of the imaging domain. Working together, and with their students, the investigators will implement the new statistical tools into publically available software, following a pattern established in earlier cycles of this grant, in which our packages have found wide use among medical researchers both at Stanford and around the world.
Statistical methods such as those to be developed in this project are essential tools to help medical researchers discover and validate new basic science results (for example in imaging and genomics) that can lead to new therapies. They aid also in the design and analysis of clinical investigations of new treatments so as to use in the most efficient manner the large amount of data collected in current research, while also accurately describing the degree of uncertainty in the conclusions.
Taylor, Jonathan; Tibshirani, Robert (2018) Post-Selection Inference for ?1-Penalized Likelihood Models. Can J Stat 46:41-61 |
Donoho, David L; Gavish, Matan; Johnstone, Iain M (2018) Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model. Ann Stat 46:1742-1778 |
Pataki, Camille I; Rodrigues, João; Zhang, Lichao et al. (2018) Proteomic analysis of monolayer-integrated proteins on lipid droplets identifies amphipathic interfacial ?-helical membrane anchors. Proc Natl Acad Sci U S A 115:E8172-E8180 |
Johnstone, Iain M (2018) Tail sums of Wishart and Gaussian eigenvalues beyond the bulk edge. Aust N Z J Stat 60:65-74 |
Johnstone, Iain M; Paul, Debashis (2018) PCA in High Dimensions: An orientation. Proc IEEE Inst Electr Electron Eng 106:1277-1292 |
Reid, Stephen; Newman, Aaron M; Diehn, Maximilian et al. (2018) Genomic Feature Selection by Coverage Design Optimization. J Appl Stat 45:2658-2676 |
Powers, Scott; Qian, Junyang; Jung, Kenneth et al. (2018) Some methods for heterogeneous treatment effect estimation in high dimensions. Stat Med 37:1767-1787 |
Groll, Andreas; Hastie, Trevor; Tutz, Gerhard (2017) Selection of effects in Cox frailty models by regularization methods. Biometrics 73:846-856 |
Johnstone, I M; Nadler, B (2017) Roy's largest root test under rank-one alternatives. Biometrika 104:181-193 |
Reid, Stephen; Tibshirani, Robert (2016) Sparse regression and marginal testing using cluster prototypes. Biostatistics 17:364-76 |
Showing the most recent 10 out of 61 publications