Quantitative, acoustic models of segmental timing in spoken English, such as have been developed for text-to-speech synthesis (TTS), acknowledge that segment durations in connected speech reflect the combined influence of systematic factors as well as nonsystematic or random factors. Systematic Variability in segment durations reflects factors such as context, stress, speaking style or register, and cognitive load. Segment durations also reflect within-speaker variability - termed Random Variability - that cannot be attributed to any of these systematic factors. An individual talker's speech duration patterns therefore can be mathematically characterized in terms of the magnitude of the effects of each systematic factor (e.g., amount of lengthening associated with word stress), as well as in terms of the relative and absolute amounts of systematic and random variability. Importantly, this powerful modeling framework can be applied to meaningful sentence productions, and is capable of isolating the effects of individual systematic factors without requiring the use of artificial speech materials. This approach to quantitatively modeling segmental timing in TTS has further proven crucial for successfully synthesizing intelligible, natural-sounding speech. Given the importance of this modeling framework for generating high quality speech synthesis, it is surprising that similar modeling efforts have not been applied to dysarthria as a means of understanding the source of reduced intelligibility and naturalness in this speech disorder. Aberrancies in the temporal patterning of speech are ubiquitous in most persons with dysarthria, and the contribution of speech duration variables to intelligibility and naturalness is suggested in a variety of studies. The approach used in many existing studies is to document whether speech durations in dysarthria are - on average - atypically short, long or variable as compared to normal speech. The TTS modeling framework described above, however, goes beyond this type of simple description to identify the relative contribution of specific systematic factors influencing segment durations for an individual speaker as well as the combined relative and absolute contributions of systematic and random factors to segmental timing for that individual. The TTS modeling framework further allows model parameters for an individual speaker to be manipulated via speech synthesis to determine the impact on intelligibility and naturalness. The proposed exploratory project seeks to apply such a quantitative modeling framework to segment durations in sentences produced by speakers with a variety of neurological diagnoses and dysarthrias. The perceptual relevance of model parameters will be further studied via speech resynthesis to determine their impact on judgments of intelligibility and naturalness.
Effective and efficacious treatment of reduced intelligibility and naturalness in dysarthria requires knowledge of factors explaining or underlying these functional limitations. The proposed exploratory project seeks to apply a quantitative model of segmental timing, developed for text-to-speech synthesis, to persons with dysarthria for whom anomalies in the temporal patterning of speech are common. Findings from this project will provide a new and comprehensive model of aberrancies in the temporal patterning of speech in dysarthria;the contribution of model parameters to perceptual judgments of intelligibility and naturalness also will be determined.