Modern observational and experimental biological data has undergone a revolution. Driven by new biotechnology and computing advances, high dimensional, high density, functional multilevel and longitudinal biological signals are becoming commonplace in medical and public health research. These types of signals historically occurred in small clinical or experimental settings, often referred to as the """"""""small n, large p"""""""" problem. We view the extension of these biological signals to cohort studies with longitudinal or hierarchical structure as a next generation of biostatistical problems. We've taken to calling this the """"""""hierarchical large n, large p"""""""" problem. The goal of this grant is to introduce general methods for analyzing this form of biostatistical data. We propose three major aims for the analysis of multilevel or longitudinally collected biosignals. The first extends multilevel functional principal components, the investigators'generalization of functional principal components, to longitudinal and high dimensional settings. The second considers the investigators bi-directional filtering and extends it in high-dimensional and longitudinal settings. The third considers model-based independent component blind source separation and extends it to longitudinal settings. To solve this aim, we will also consider the fundamental problem of running MCMC samplers for high dimensional parameter spaces. Specifically, no current work exists for convergence control when the number of parameters is larger than the number of iterations. We propose a method of convergence control using finite population sampling. Our methods will be applied to unique data sets involving imaging (MRI, fMRI, DTI), electrophysiology (EEG, ECOG), sleep measurement (polysomnography) and novel measurements of aging (accelerometer). In the preliminary results, we demonstrate our capacity for working with such data with novel findings in the analysis of EEG, MRI and fMRI data sets. Methods such as unsupervised clustering, blind source separation and dimension reduction are generally recognized first steps in analyzing high dimensional data, and have been applied success- fully in an amazingly diverse collection of settings. Our proposal generalizes these basic approaches to high dimensional data while considering hierarchical and longitudinal directions of variation. Hence, our approaches will form a basic foundation for next generation biomedical functional data.

Public Health Relevance

Modern observational data is often longitudinal or multilevel functional biological signals. We propose a foundational approach for the analysis of such data, including scalable computing to next generation data sets.

Agency
National Institute of Health (NIH)
Institute
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Type
Research Project (R01)
Project #
5R01EB012547-03
Application #
8321037
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Luo, James
Project Start
2010-09-30
Project End
2016-10-31
Budget Start
2012-09-01
Budget End
2013-08-31
Support Year
3
Fiscal Year
2012
Total Cost
$342,017
Indirect Cost
$111,114
Name
Johns Hopkins University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
001910777
City
Baltimore
State
MD
Country
United States
Zip Code
21218
Chén, Oliver Y; Crainiceanu, Ciprian; Ogburn, Elizabeth L et al. (2018) High-dimensional multivariate mediation with application to neuroimaging data. Biostatistics 19:121-136
Rubin, Leah H; Sacktor, Ned; Creighton, Jason et al. (2018) Microglial activation is inversely associated with cognition in individuals living with HIV on effective antiretroviral therapy. AIDS 32:1661-1667
Mejia, Amanda F; Nebel, Mary Beth; Barber, Anita D et al. (2018) Improved estimation of subject-level functional connectivity using full and partial correlation with empirical Bayes shrinkage. Neuroimage 172:478-491
Chen, Shaojie; Liu, Kai; Yang, Yuguang et al. (2017) An M-estimator for reduced-rank system identification. Pattern Recognit Lett 86:76-81
Chen, Shaojie; Huang, Lei; Qiu, Huitong et al. (2017) Parallel group independent component analysis for massive fMRI data sets. PLoS One 12:e0173496
Muschelli, John; Sweeney, Elizabeth M; Ullman, Natalie L et al. (2017) PItcHPERFeCT: Primary Intracranial Hemorrhage Probability Estimation using Random Forests on CT. Neuroimage Clin 14:379-390
Webb-Vargas, Yenny; Chen, Shaojie; Fisher, Aaron et al. (2017) Big Data and Neuroimaging. Stat Biosci 9:543-558
Choe, Ann S; Nebel, Mary Beth; Barber, Anita D et al. (2017) Comparing test-retest reliability of dynamic functional connectivity methods. Neuroimage 158:155-175
Yue, Chen; Zipunnikov, Vadim; Bazin, Pierre-Louis et al. (2016) Parametrization of white matter manifold-like structures using principal surfaces. J Am Stat Assoc 111:1050-1060
Fisher, Aaron; Caffo, Brian; Schwartz, Brian et al. (2016) Fast, Exact Bootstrap Principal Component Analysis for p > 1 million. J Am Stat Assoc 111:846-860

Showing the most recent 10 out of 74 publications