An ongoing challenge in health sciences research is the development of statistical models used to study relationships between subject characteristics and interventions and disease onset, recurrence, and progression and other health outcomes. This enterprise has become considerably more complex as new technologies, the quest to discover new biomarkers, and improved resources for handling vast repositories of data have led to the collection of high-dimensional information, and there has been extensive research on formal methods for identifying important prognostic variables to include in a model to be used, e.g., to assess population risk. The objective of the first two specific aims of this renewal application is to develop new methods for such variable selection in model-building. Many key health status variables collected in studies of chronic disease, e.g., blood pressure or serum biomarker levels, are imprecise measurements of a "true" quantity relevant to understanding risk, such as long-term blood pressure.
The first aim i s to develop methods for variable selection when some such covariates are subject to such measurement error. Linear and generalized linear models for independent data and their mixed-effects counterparts for longitudinal and other clustered data are widely used, but these parametric models may not be sufficiently flexible to approximate the complex relationships involved.
The second aim i s to extend advances made in our previous project period toward new methods for simultaneous parameter estimation and variable selection in more flexible semiparametric such models to develop new techniques that allow for arbitrary numbers of both parametric and nonparametric covariate effects, general outcome variables (e.g., continuous, binary), and adaptive identification of such effects. A key objective in many stud- ies is to elucidate the association between features of longitudinal profiles of biomarkers or other continuous measures and a primary health outcome using so-called joint models. Standard joint models represent the subject-specific profiles via a mixed-effects model, e.g., as straight lines with random subject-specific intercepts and slopes, whose random parameters are included as covariates in a model for the primary outcome. In some settings, interest may focus on the association between outcome and not only features such as slopes but also intra-subject variation in the longitudinal measure.
The third aim i s to develop new methods for joint models involving both random intra-subject mean and variance parameters, exploiting techniques developed in the previous project period. Many longitudinal measures are censored due to limits of quantification of the assay used in their determination, and longitudinal analysis must take this into appropriate account.
Our fourth aim focuses on development of new methods for mixed-effects models that address not only this issue but draw on work in previous project periods to relax the usual normality assumption on random effects and yield an estimate of their density, providing the analyst with a tool for exploring underlying features of the population.

Public Health Relevance

The research to be carried out in this project will provide health sciences researchers with new tools to build statistical models that can be used to learn about relationships among subject characteristics, such as physiologic, demographic, and genetic attributes and markers of disease progression, even if these are not measured precisely;treatments;and disease outcomes. These tools will enable them to use the models to identify key risk factors for disease and deleterious health outcomes. In addition, new methods will be developed for analyzing longitudinal biomarker measures reflecting disease progression when these are not always observed due to limits of the measuring device.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Dunn, Michelle C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
North Carolina State University Raleigh
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Stefanski, L A; Wu, Yichao; White, Kyle (2014) Variable Selection in Nonparametric Classification via Measurement Error Model Selection Likelihoods. J Am Stat Assoc 109:574-589
Bernhardt, Paul W; Wang, Huixia Judy; Zhang, Daowen (2014) Flexible Modeling of Survival Data with Covariates Subject to Detection Limits via Multiple Imputation. Comput Stat Data Anal 69:
Vock, David M; Davidian, Marie; Tsiatis, Anastasios A (2014) SNP_NLMM: A SAS Macro to Implement a Flexible Random Effects Density for Generalized Linear and Nonlinear Mixed Models. J Stat Softw 56:2
Hao, Ning; Zhang, Hao Helen (2014) Interaction Screening for Ultra-High Dimensional Data. J Am Stat Assoc 109:1285-1301
Shin, Seung Jun; Wu, Yichao; Zhang, Hao Helen (2014) Two-Dimensional Solution Surface for Weighted Support Vector Machines. J Comput Graph Stat 23:383-402
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen (2014) Structured functional additive regression in reproducing kernel Hilbert spaces. J R Stat Soc Series B Stat Methodol 76:581-603
Molenberghs, Geert; Kenward, Michael G; Aerts, Marc et al. (2014) On random sample size, ignorability, ancillarity, completeness, separability, and degeneracy: sequential trials, random sample sizes, and missing data. Stat Methods Med Res 23:11-41
Caner, Mehmet; Zhang, Hao Helen (2014) Adaptive Elastic Net for Generalized Methods of Moments. J Bus Econ Stat 32:30-47
Laber, Eric B; Tsiatis, Anastasios A; Davidian, Marie et al. (2014) Discussion of "Combining biomarkers to optimize patient treatment recommendation". Biometrics 70:707-10
Verbeke, Geert; Fieuws, Steffen; Molenberghs, Geert et al. (2014) The analysis of multivariate longitudinal data: a review. Stat Methods Med Res 23:42-59

Showing the most recent 10 out of 59 publications