This Small Business Innovation Research Phase I research project will perform the research and development necessary to greatly enhance the information retrieval capability of a fast phonetic word-spotter. The completed research will lead to new methods for spoken document retrieval and classification on low quality telephony audio or multimedia digital sources. Spoken document retrieval has been a well-researched problem in the domain of broadcast news. However, many applications exist where users must retrieve and classify documents with lower quality audio. The most commonly applied method involves converting an audio stream or file into a hypothesized sequence of words (Speech-to-Text or STT), and subsequently using text- based information retrieval. Although this has been shown to be effective for broadcast news document retrieval, this has drawbacks. For example, STT's explicit use of language models limits the hypothesized word sequences to those within its lexicon. On the other hand, phonetic matching is capable of identifying likely instances of keywords, such as names, which are not in a lexicon. One advantage of the STT approach is the applicability of text-based information retrieval methods, which work well on high quality audio where the error rates are fairly small. However, better solutions are necessary over a high volume telephony channel where the computational burden and low accuracy make STT impractical. The goal of the proposed project is to research and develop phonetic-based document retrieval and classification algorithms. The applicability of retrieval systems based on phonetic searches will be compared on large existing corpora.
The key innovation of the proposed research is to adapt search techniques to function in environments where audio exists, but text does not. Scientifically, algorithms must be made to work in a probabilistic framework, since phonetic word spotting is always based on confidence measures. Commercially, existing multimedia or audio archives will be available for data mining. In addition, decisions of document type (e.g., was the phone call to the call center a complaint?) open commercial applications in market intelligence, security analysis, quality analysis, and any call segregation application.