This is a project to develop new statistical methods for comparing groups of subjects in terms of health outcomes that are assessed using data from wearable devices. Inexpensive wearable sensors for health monitoring are now capable of generating massive amounts of data collected longitudinally, up to months at a time. The project will develop inferential methods that can deal with the complexity of such data. A serious challenge is the presence of unmeasured time-dependent confounders (e.g., circadian and dietary patterns), making direct comparisons or borrowing strength across subjects untenable unless the studies are carried out in controlled experimental con- ditions. Generic data mining and machine learning tools have been widely used to provide predictions of health status from such data. However, such tools cannot be used for signi?cance testing of covariate effects, which is necessary for designing precision medicine interventions, for example, without taking the inherent model selection or the presence of the unmeasured confounders into account. To overcome these dif?culties, a systematic de- velopment of inferential methods for functional outcome data obtained from wearable devices will be carried out. There are three speci?c aims: 1) Develop metrics for functional outcome data from wearable devices, 2) Develop nonparametric estimation and testing methods for activity pro?les and a screening method for predictors of activity pro?les, 3) Implement the methods in an R package and carry out two case studies using accelerometer data.
For Aim 1, the approach is to reduce the sensor data to occupation time pro?les (e.g., as a function of activity level), and formulate the statistical modeling in terms of these pro?les using survival and functional data analytic meth- ods. This will have a number of advantages, the principal one being that time-dependent confounders become less problematic because the effect of differences in temporal alignment across subjects is mitigated. In addition, survival analysis methods can be applied by viewing the occupation time as a time-to-event outcome indexed by activity level.
For Aim 2, nonparametric methods will be used to compare and order occupation time distributions between groups of subjects that are speci?ed in terms of baseline covariate levels or treatment groups. Further, a new method of post-selection inference based on marginal screening for function-on-scalar regression will be developed to identify and formally test whether covariates are signi?cantly associated with activity pro?les.
Aim 3 will develop an R-package implementation, and as a test-bed for the proposed methods they will be applied to two Columbia-based clinical studies: to the study of physical activity in children enrolled in New York City Head Start, and to the study of experimental drugs for the treatment of mitochondrial depletion syndrome.

Public Health Relevance

The relevance of the project to public health is that it will develop statistical methods for the physiological eval- uation of patients on the basis of data collected by inexpensive wearable sensors (e.g., accelerometers). By introducing methods for the rigorous comparison of healthcare status among groups of patients observed longi- tudinally over time using such devices, treatment decisions that can bene?t targeted populations of patients in terms of continuously-assessed health outcomes will become possible.

National Institute of Health (NIH)
National Institute on Aging (NIA)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Phillips, John
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Biostatistics & Other Math Sci
Schools of Public Health
New York
United States
Zip Code