Alzheimer disease (AD) is the most common neurodegenerative disorder. Pathological changes in the brain can be observed at least 15 years before clinical symptoms (preclinical stage). An early and accurate diagnosis tool could save $7.9 trillion in medical and care costs. Moreover, an effective therapeutic strategy could improve the clinical outcome if delivered early. There is a clear need to develop cost-effective and non-invasive biomarkers for AD that can be used to identify individuals before symptoms emerge and patients at early-symptomatic stages of disease. These novel biomarkers could be also leveraged to monitor disease progression and responses to therapies. Cell-free nucleic acids diagnostic tests have revolutionized prenatal screening, and cancer research, diagnosis and treatment. Furthermore, specific transcripts ascertained from cell-free RNA have been evaluated as biomarkers for AD, but so far, no high throughput approach has been attempted. The goal of this proposal is to use high throughput sequencing of cell-free nucleic acids from plasma to construct a prediction model for neurodegenerative diseases. I hypothesize that there are detectable changes in plasma cell-free nucleic acids that are related to AD. During the K99 phase, I aim to predict accurately AD cases using cell-free nucleic acid and bioinformatics tools, including machine learning. Briefly, I will sequence cell-free RNA present in longitudinal samples of plasma from AD cases and controls, then build a predictive model. I will replicate this model in an independent dataset of preclinical samples. I will include samples from mutation carriers and non-European ancestry to validate the model. I will also determine if the model can predict other neurodegenerative diseases or if it is specific to AD by quantifying plasma transcripts from patients with other neurodegenerative diseases. My preliminary data show that this approach is feasible. I designed a preliminary predictive model with 10 AD cases and 10 controls that has an area under the ROC curve of 1; then I replicated it in independent samples (n=20) with an area under the ROC curve of 0.84. In four preclinical samples the ROC was 0.86 suggesting that my model can also identify pre-symptomatic individuals. It is possible to improve this model by using more powerful informatics approaches. Using deep neural networks, I obtained a ROC of 1 in the discovery dataset and 0.94 in the replication dataset. During the R00 phase, I plan to use the same approach on other neurodegenerative diseases to design specific predictive models. I will generate sequence data on the RNA present in longitudinal plasma samples of cases and controls from Parkinson?s disease and dementia with Lewy bodies to construct specific predictive models for each of them. Then I will replicate the models in preclinical samples of these diseases. Combining the information on all neurodegenerative diseases will also allow me to refine the predictive model and perform integrative analyses to describe mechanistic insights. My ultimate goal is to be able to use the predictive models as diagnostic tools, and if possible, as early diagnostic tests. The preliminary data is encouraging and opens the possibility of having plasma-based tests for neurodegeneration.
This proposal will use transcriptomic data from plasma to model a predictive tool for Alzheimer Disease using bioinformatics and machine learning approaches. This is an innovative approach, a cost-effective and minimally invasive biomarker for Alzheimer Disease, which is critical for early detection (preclinical), diagnosis, disease intervention and monitoring. If successful, the results will lead to a breakthrough in the field.