The ?preclinical? phase of Alzheimer?s disease (AD) is characterized by abnormal levels of brain amyloid accumulation in the absence of major symptoms, can last decades, and potentially holds the key to successful therapeutic strategies. Today there is an urgent need for quantitative biomarkers and genetic tests that can predict clinical progression at the individual level. This project will develop cutting edge machine learning algorithms that will mine high dimensional, multi-modal, and longitudinal data to derive models that yield individual-level clinical predictions in the context of dementia. The developed prognostic models will specifically utilize ubiquitous and affordable data types: structural brain MRI scans, saliva or blood-derived genome-wide sequence data, and demographic variables (age, education, and sex). Prior research has demonstrated that all these variables are strongly associated with clinical decline to dementia, however to date we have no model that can harvest all the predictive information embedded in these high dimensional data. Machine learning (ML) algorithms are increasingly used to compute clinical predictions from high- dimensional biomedical data such as clinical scans. Yet, most prior ML methods were developed for applications where the ``prediction?? task was about concurrent condition (e.g., discriminate cases and controls); and established risk factors (e.g., age), multiple modalities (e.g., genotype and images) and longitudinal data were not fully exploited. This application?s core innovation will be to develop rigorous, flexible, and practical ML methods that can fully exploit multi-modal, longitudinal, and high- dimensional biomedical data to compute prognostic clinical predictions. The proposed project will build on the PI?s strong background in computational modeling and analysis of large-scale biomedical data. We will employ an innovative Bayesian ML framework that offers the flexibility to handle and exploit real-life longitudinal and multi-modal data. We hypothesize that the developed models will be more useful than alternative benchmarks for identifying preclinical individuals who are at heightened risk of imminent clinical decline. We will use a statistically rigorous approach for discovery, cross-validation, and benchmarking the developed tools. This project will yield freely distributed, documented, and validated software and models for predicting future clinical progression based on whole-genome, longitudinal structural MRI and demographic data. We believe the algorithms and software we develop will yield invaluable tools for stratifying preclinical AD subjects in drug trials, optimizing future therapies, and minimizing the risk of adverse effects.

Public Health Relevance

Emerging technologies allow us to identify clinically healthy subjects harboring Alzheimer?s pathology. While many of these preclinical individuals progress to dementia, sometimes quite quickly, others remain asymptomatic for decades. The proposed project will develop sophisticated data mining algorithms to derive models that can predict future clinical decline based on ubiquitous, easy- to-collect, and affordable data modalities: brain MRI scans, saliva or blood- derived whole-genome sequences, and clinical and demographic variables.

National Institute of Health (NIH)
National Institute on Aging (NIA)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Hsiao, John
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Cornell University
Engineering (All Types)
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code