The most fundamental aspects of statistical modeling in speech recognition, linear Gaussian statistics, have been essentially unchanged in the past 20 years. Fundamental advancement is required if speech recognition is to become a pervasive technology in a myriad of applications requiring robustness to severe amounts of noise (e.g., cell phones and in-vehicle automotive applications). Nonlinear statistical models for speech were first proposed in the early 1980's when fractals and other such techniques promised great advances in compression. Since then progress has been slow but steady. Recent advances in various areas of speech processing, such as pitch determination and speech modeling, plus staggering advances in computational resources, suggest that these models are now viable for traditional problems such as speaker recognition, speaker verification, and speech recognition. Nonlinear dynamics provide a framework that supports parsimonious statistical models that may overcome many of the limitations of current hidden Markov model based techniques.
This research involves extending the traditional supervised-learning HMM paradigm to support a chaotic acoustic model that incorporates a nonlinear statistical model of observation vectors and then evaluating the impact of this model on text-independent speaker verification applications. The primary goal is to understand acoustic variation at the phonetic level in a more comprehensive and efficient manner. The proposed research could go far to enhance the potential practicality of nonlinear speech modeling. In addition, the computational tools and resources to be developed are expected to enhance the existing infrastructure for Internet accessible speech recognition, while promoting better understanding of speech in both research and education.