An ongoing challenge in health sciences research is the development of statistical models used to study relationships between subject characteristics and interventions and disease onset, recurrence, and progression and other health outcomes. This enterprise has become considerably more complex as new technologies, the quest to discover new biomarkers, and improved resources for handling vast repositories of data have led to the collection of high-dimensional information, and there has been extensive research on formal methods for identifying important prognostic variables to include in a model to be used, e.g., to assess population risk. The objective of the first two specific aims of this renewal application is to develop new methods for such variable selection in model-building. Many key health status variables collected in studies of chronic disease, e.g., blood pressure or serum biomarker levels, are imprecise measurements of a """"""""true"""""""" quantity relevant to understanding risk, such as long-term blood pressure.
The first aim i s to develop methods for variable selection when some such covariates are subject to such measurement error. Linear and generalized linear models for independent data and their mixed-effects counterparts for longitudinal and other clustered data are widely used, but these parametric models may not be sufficiently flexible to approximate the complex relationships involved.
The second aim i s to extend advances made in our previous project period toward new methods for simultaneous parameter estimation and variable selection in more flexible semiparametric such models to develop new techniques that allow for arbitrary numbers of both parametric and nonparametric covariate effects, general outcome variables (e.g., continuous, binary), and adaptive identification of such effects. A key objective in many stud- ies is to elucidate the association between features of longitudinal profiles of biomarkers or other continuous measures and a primary health outcome using so-called joint models. Standard joint models represent the subject-specific profiles via a mixed-effects model, e.g., as straight lines with random subject-specific intercepts and slopes, whose random parameters are included as covariates in a model for the primary outcome. In some settings, interest may focus on the association between outcome and not only features such as slopes but also intra-subject variation in the longitudinal measure.
The third aim i s to develop new methods for joint models involving both random intra-subject mean and variance parameters, exploiting techniques developed in the previous project period. Many longitudinal measures are censored due to limits of quantification of the assay used in their determination, and longitudinal analysis must take this into appropriate account.
Our fourth aim focuses on development of new methods for mixed-effects models that address not only this issue but draw on work in previous project periods to relax the usual normality assumption on random effects and yield an estimate of their density, providing the analyst with a tool for exploring underlying features of the population.

Public Health Relevance

The research to be carried out in this project will provide health sciences researchers with new tools to build statistical models that can be used to learn about relationships among subject characteristics, such as physiologic, demographic, and genetic attributes and markers of disease progression, even if these are not measured precisely;treatments;and disease outcomes. These tools will enable them to use the models to identify key risk factors for disease and deleterious health outcomes. In addition, new methods will be developed for analyzing longitudinal biomarker measures reflecting disease progression when these are not always observed due to limits of the measuring device.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Dunn, Michelle C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
North Carolina State University Raleigh
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
White, Kyle R; Stefanski, Leonard A; Wu, Yichao (2017) Variable Selection in Kernel Regression Using Measurement Error Selection Likelihoods. J Am Stat Assoc 112:1587-1597
Linn, Kristin A; Laber, Eric B; Stefanski, Leonard A (2017) Interactive Q-learning for Quantiles. J Am Stat Assoc 112:638-649
Vock, David M; Durheim, Michael T; Tsuang, Wayne M et al. (2017) Survival Benefit of Lung Transplantation in the Modern Era of Lung Allocation. Ann Am Thorac Soc 14:172-181
Chen, Jinsong; Liu, Lei; Shih, Ya-Chen T et al. (2016) A flexible model for correlated medical costs, with application to medical expenditure panel survey data. Stat Med 35:883-94
Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel et al. (2016) Properties of Estimators in Exponential Family Settings with Observation-based Stopping Rules. J Biom Biostat 7:
Zhang, Daowen; Sun, Jie Lena; Pieper, Karen (2016) Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes. Stat Biosci 8:220-233
Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel et al. (2015) Estimation After a Group Sequential Trial. Stat Biosci 7:187-205
Zhang, Yichi; Laber, Eric B; Tsiatis, Anastasios et al. (2015) Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics 71:895-904
Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen (2015) Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits. Stat Biosci 7:68-89
(2015) Response to reader reaction. Biometrics 71:267-273

Showing the most recent 10 out of 88 publications