Designing an efficient Huntington's disease (HD) early intervention clinical trial for individuals who have an expanded CAG repeats in the huntingtin gene requires identifying and combining clinical, biological, cognitive, and brain imaging markers to accurately distinguish among subjects who will have a diagnosis during a given intervention period and those who will not, and to track early changes in the disease course. The goal of this project is to identify sensitive biomarkers for HD risk stratification, indexing disease progression, and developing clinical trial endpoints. The proposal directly adheres to """"""""2P's"""""""" of the NIH New Strategic Vision of the """"""""4P's"""""""" of Medicine: they will offer promising ways to predict when the disease will develop;and increase the capacity to personalize early intervention based on the informative patient-specific markers our models identify. Combining biomarkers to predict HD onset and progression is an essential step in a continuum of research for development of disease-modifying therapies. Composite markers and their risk profiles created from our model will offer quantitative way to monitor and compare potential interventions. Evidence collected from these comparisons will advance the development of efficacy studies in premanifest HD, where neuroprotective treatments would be most beneficial. We develop and apply a series of cutting-edge statistical learning methods based on support vector machine (SVM), variable selection, and dimension reduction to achieve these goals. These modern statistical methods designed for correlated big data have quickly emerged as among the most successful tools for hypothesis generation, classification and prediction in biomedical studies. However, they have not been introduced to HD biomarker research.
In aim 1, using counting process, we propose SVM to handle time-to-event outcomes (e.g., time-to-HD-diagnosis) to combine markers into risk scores to discriminate subjects who will experience HD onset in the immediate future from those who will not, based on their personalized features. Although SVM is well studied for binary outcomes, it is far less explored for time-to-event outcomes. We fill this gap in knowledge.
In aim 2, we propose new learning methods for longitudinal outcomes to combine markers that modify the course of HD signs to monitor disease process and distinguish subjects with rapid progression from those with slower progression.
In aim 3, we propose to use novel and robust performance measures to compare derived combined markers with existing disease indices and key markers.
These aims will fundamentally advance our understanding of markers linked to HD onset and progression. The creation of statistical models for composite markers and risk profiles is especially useful in: (1) offering quantitative ways to monitor and compare potential interventions, and (2) improving power of efficacy studies targeted at premanifest individuals by narrowing the predictive interval which leads to future clinical trials that can be made shorter with fewer subjects. Finally, our improved predictions of HD onset and progression will provide more informative genetic counseling sessions for pre-symptomatic subjects at risk of HD.

Public Health Relevance

The goal of Huntington's disease (HD) research is to develop experimental therapeutics to delay onset or slow disease progression, and to provide different treatment regimens at each disease stage. To meet this goal, this proposal develops and applies a series of advanced statistical approaches to rank and combine clinical, behavioral, and brain imaging markers to predict HD diagnosis in premanifest subjects during a given time period and to measure disease progression. The creation of model for composite markers and risk profiles is useful in offering quantitative ways to monitor and compare interventions and powering clinical trials for premanifest HD individuals.

National Institute of Health (NIH)
National Institute of Neurological Disorders and Stroke (NINDS)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZNS1)
Program Officer
Sutherland, Margaret L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Biostatistics & Other Math Sci
Schools of Public Health
New York
United States
Zip Code
Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen (2016) Quantile Regression Models for Current Status Data. J Stat Plan Inference 178:112-127
Liu, Ying; Wang, Yuanjia; Huang, Chaorui et al. (2016) Estimating personalized diagnostic rules depending on individualized characteristics. Stat Med :
Liu, Ying; Wang, Yuanjia; Feng, Yang et al. (2016) VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA. Ann Appl Stat 10:418-450
Gianini, Loren M; Klein, Diane A; Call, Christine et al. (2016) Physical activity and post-treatment weight trajectory in anorexia nervosa. Int J Eat Disord 49:482-9
Liu, Ying; Wang, Yuanjia; Zeng, Donglin (2016) Sequential multiple assignment randomization trials with enrichment design. Biometrics :
Song, Rui; Kosorok, Michael; Zeng, Donglin et al. (2015) On Sparse representation for Optimal Individualized Treatment Selection with Penalized Outcome Weighted Learning. Stat 4:59-68
Cao, Hongyuan; Churpek, Mathew M; Zeng, Donglin et al. (2015) Analysis of the Proportional Hazards Model with Sparse Longitudinal Covariates. J Am Stat Assoc 110:1187-1196
Zhao, Ying-Qi; Zeng, Donglin; Laber, Eric B et al. (2015) New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes. J Am Stat Assoc 110:583-598
Marder, Karen; Wang, Yuanjia; Alcalay, Roy N et al. (2015) Age-specific penetrance of LRRK2 G2019S in the Michael J. Fox Ashkenazi Jewish LRRK2 Consortium. Neurology 85:89-95
Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R (2015) Reinforcement Learning Trees. J Am Stat Assoc 110:1770-1784

Showing the most recent 10 out of 26 publications