The focus of this project is to develop techniques to objectively (automatically) measure spoken language variability and change in aging. Many of the most effective methods for cognitive assessment are mediated by observed behavior, particularly spoken language production. These include clinical instruments, e.g., the Mini Mental Status Examination (MMSE), but also less formal assessments involving interviews or dialogs with physicians or even friends and family. Behavioral changes noted through these spoken language interactions could indicate pathological changes associated with a disorder; or the changes may be transient, due to missing medication or depression at the time of assessment. Alternatively, the observed behavior may be simply due to normal change in spoken language due to aging, or even within the range of natural behavioral variation. Understanding normal versus pathological language change with age requires the collection and annotation of repeated samples from both healthy and impaired individuals. This project has three specific aims: 1) to collect and transcribe longitudinal spoken language sample data elicited in multiple ways from diverse elderly adults; 2) to develop algorithms for automatically extracting features from these spoken language samples; and 3) to characterize the variability of feature values across samples of the same individual; and the utility of feature values and even feature variances for discriminating between subject groups. A particular challenge being addressed by this research is to achieve high-quality, efficient automatic annotation of discourse structure for the spoken language samples. The resulting methods are expected to directly contribute to important behavioral assessment applications.

Project Report

This project was focused on understanding and automatically measuring the language changes that result from typical aging as well as language changes due to neurological disorders. These topics were pursued through the collection and transcription of spoken language data for analysis and the development of new algorithms to automate natural language analysis. The central intellectual merit of the project resided in the design of new algorithms to automatically analyze spoken language samples and derive measures that characterize the samples in meaningful ways for both language assessment and clinical diagnosis. These systems for automatic language analysis were a particularly important dimension of the project, because detailed analysis of language samples by human experts is very expensive and time consuming, to the point of precluding the widespread adoption of such analysis, no matter how informative. Given the current explosive growth of the elderly population -- long having reached the point where expert human analysis of language cannot keep up with the amount of language that ideally needs to be analyzed -- the project had a broader impact on the access of individuals and clinicians to technology for effectively and efficiently assessing characteristics and changes in characteristics of an individual's use of language, due to natural brain aging or other causes. A corpus of spoken language was collected from over 50 seniors, during 30 minute sessions recorded every six months over several years. The spoken language in the sessions was elicited from participants via a number of different prompts, such as having them retell a story or describe life events. Project members developed new algorithms for automatically measuring characteristics of language samples, such as various types of language complexity measures or the accuracy of the retelling of a particular story. This project resulted in new methods that can be used to screen individuals for early indications of such age-related disorders as Alzheimer's related dementia. Many of the most interesting and significant results in the project came from automated analysis of narrative (story) retellings. Such retelling tasks are a common part of many neuropsychological exams, both for adults and children, as the means for testing memory and expressive language. We demonstrated that a collection of automatically extracted features capturing characteristics of the language used -- without considering whether the retelling was actually related to the original story being retold or not -- allowed for relatively accurate automatic classification of individuals into groups of the typically aging and those diagnosed with mild cognitive impairment, also known as incipient dementia. Later in the project, we developed new algorithms (related to some widely used in automatic machine translation) to analyze retellings for their fidelity to the original narrative, and with these automated analysis methods were also able to accurately classify groups of individuals with neurological impairment versus typical controls. Fully automated methods, making use of automatic speech recognition and automated language analysis on the output of the recognizer, were demonstrated to preserve accurate classification performance. Multiple publications provide a detailed presentation of the algorithms, overall systems and experimental results, for these approaches and others. While many of the methods in this project were developed to derive features of utility for a specific disorder (e.g., Alzheimer's) or for use with a specific type of language sample (story recall), many of the algorithms were general enough to be applied to analyze language samples pertaining to other neurological disorders (e.g., autism spectrum disorder) or language sample types (e.g., picture book narration). A broad range of methods were ultimately investigated, and significant findings included new methods to automatically identify off-topic words and utterances within language samples with high accuracy. Over 15 papers were published during the course of this project, presenting a range of new methods in the growing area of automatic clinical language sample analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Type
Standard Grant (Standard)
Application #
0826654
Program Officer
Amber L. Story
Project Start
Project End
Budget Start
2008-11-01
Budget End
2014-04-30
Support Year
Fiscal Year
2008
Total Cost
$762,000
Indirect Cost
Name
Oregon Health and Science University
Department
Type
DUNS #
City
Portland
State
OR
Country
United States
Zip Code
97239