Medical and biological data often come in the form of digitized signals and images, for example magnetic resonance images (MRI), ion channel electrical series, and human gait paths. As data acquisition becomes easier, sequences of such images or signals are collected, often along with other covariate measurements, resulting in data sets where the basic unit of measurement or response is a high dimensional object. This project proposes a battery of statistical techniques for modeling and understanding such data, that explicitly takes into account and indeed exploits the inherent, spatial, or temporal correlation, and when appropriate, relates it to covariate information. By imposing spatial smoothness in the image or signal domain, pixel-wise regression, and canonical correlation models can borrow strength from neighboring pixels. This not only improves the overall efficiency of these techniques, but also allows identification of important regions rather than individual pixels. The project develops appropriate versions of nonparametric regressions for such series of images, as well as data descriptions such as clustering, principal component, and singular value decomposition models. In many cases, wavelets will be used to achieve spatial smoothness. In the case of ion channel data, the models are used to isolate particular weak high frequency components from correlated noise. Much of this work will be carried out in collaboration with radiologists, physiologists, and other biomedical researchers working on cancer, heart disease and stroke, brain mapping, and gait analysis.
Shen-Orr, Shai S; Tibshirani, Robert; Khatri, Purvesh et al. (2010) Cell type-specific gene expression differences in complex tissues. Nat Methods 7:287-9 |
Witten, Daniela M; Tibshirani, Robert; Hastie, Trevor (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10:515-34 |
Friedman, Jerome; Hastie, Trevor; Tibshirani, Robert (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432-41 |
Johnstone, Iain M (2008) MULTIVARIATE ANALYSIS AND JACOBI ENSEMBLES: LARGEST EIGENVALUE, TRACY-WIDOM LIMITS AND RATES OF CONVERGENCE. Ann Stat 36:2638 |
Park, Mee Young; Hastie, Trevor (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30-50 |
Park, Mee Young; Hastie, Trevor; Tibshirani, Robert (2007) Averaged gene expressions for regression. Biostatistics 8:212-27 |
Chipman, Hugh; Tibshirani, Robert (2006) Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7:286-301 |
Zhu, Ji; Hastie, Trevor (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5:427-43 |
Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian et al. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99:6567-72 |
Troyanskaya, O; Cantor, M; Sherlock, G et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520-5 |
Showing the most recent 10 out of 12 publications