There is an acute and increasing need to adapt standard statistical methods and to develop new approaches for the analysis of very large data sets. A data set is very large if it raises very difficult or insurmountable computational problems for standard data analysis using available computing systems. The accelerated increase in size and complexity of data sets is due in part to increased computational and storage capabilities, new measurement technologies, study designs, and an increasing number of study "units." This proposal is concerned with statistical methods for the analysis of an emerging type of very large data set, where very high dimensional outcomes and predictors, such as images or densely sampled biosignals, are recorded at multiple visits on hundreds or thousands of subjects. The methods proposed will describe the cross-sectional, longitudinal and measurement error variability in longitudinal studies where observed data are functions or images. Methods for scalar on function/image regression analysis will also be addressed for the case of very highly dimensional predictors. The proposed methodology is inspired by and applied to very large studies of sleep and Diffusion Tensor Imaging (DTI) brain tractography.

Public Health Relevance

The project provides statistical analysis methods for very large data sets where images or densely sampled biological signals are measured at multiple visits. Methods are applied to longitudinal sleep electroencephalogram (EEG) data and brain tractography obtained from Diffusion Tensor Imaging (DTI) in Multiple Sclerosis (MS) and healthy subjects.

National Institute of Health (NIH)
National Institute of Neurological Disorders and Stroke (NINDS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gnadt, James W
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Lee, Kuo-Jung; Jones, Galin L; Caffo, Brian S et al. (2014) Spatial Bayesian Variable Selection Models on Functional Magnetic Resonance Imaging Time-Series Data. Bayesian Anal 9:699-732
Risk, Benjamin B; Matteson, David S; Ruppert, David et al. (2014) An evaluation of independent component analyses with an application to resting-state fMRI. Biometrics 70:224-36
McLean, Mathew W; Hooker, Giles; Staicu, Ana-Maria et al. (2014) Functional Generalized Additive Models. J Comput Graph Stat 23:249-269
Lindquist, Martin A; Xu, Yuting; Nebel, Mary Beth et al. (2014) Evaluating dynamic bivariate correlations in resting-state fMRI: a comparison study and a new approach. Neuroimage 101:531-46
Eloyan, Ani; Shou, Haochang; Shinohara, Russell T et al. (2014) Health effects of lesion localization in multiple sclerosis: spatial registration and confounding adjustment. PLoS One 9:e107263
Nebel, Mary Beth; Joel, Suresh E; Muschelli, John et al. (2014) Disruption of functional organization within the primary motor cortex in children with autism. Hum Brain Mapp 35:567-80
Goldsmith, Jeff; Huang, Lei; Crainiceanu, Ciprian M (2014) Smooth Scalar-on-Image Regression via Spatial Bayesian Variable Selection. J Comput Graph Stat 23:46-64
Ma, Xin; Xiao, Luo; Wong, Wing Hung (2014) Learning regulatory programs by threshold SVD regression. Proc Natl Acad Sci U S A 111:15675-80
He, Bing; Bai, Jiawei; Zipunnikov, Vadim V et al. (2014) Predicting human movement with multiple accelerometers using movelets. Med Sci Sports Exerc 46:1859-66
Di, Chongzhi; Crainiceanu, Ciprian M; Jank, Wolfgang S (2014) Multilevel sparse functional principal component analysis. Stat 3:126-143

Showing the most recent 10 out of 33 publications