This Small Business Innovation Research Phase I project will perform the research and development necessary to integrate extra information gathered from an existing phonetic word-spotting technology into a language and dialect identification system, thus enhancing the identification system. The research objective of this proposal is to use Nexidia's existing wordspotting technology to improve a state of the art language identification system. Wordspotting is the technique where a word (or phrase) is searched for in audio, with the return being a set of timestamps where the word or phrase might have occurred, along with a confidence score for each timestamp. Standard state-of-the-art language identification systems currently are based on Gaussian Mixture Models and phoneme statistics of each candidate language. They cannot use full speech recognition for computational reasons. However, wordspotting is lightweight, needing only a fraction of a CPU. If a list of several thousand common words and phrases is generated, it is very likely that in speech more than a few seconds long, an item from this list will be spoken. Thus for this project, it is proposed to begin with a state of the art language identification system, and augment it by such a search from each candidate language. The expected result is a language identification system capable of outperforming current state of the art systems
The ability to automatically classify which language is being spoken in a segment of speech would be a highly desirable feature in many speech communications systems. The proposed method for language identification is an extension to state of the art systems. As such, a baseline for performance can be considered to be current state of the art, and it is probable that the proposed research will result in better classification accuracy than is currently reported in the literature. If better accuracy is achieved, the proposed structure could become a standard. Further, there is no commercial product available at this time to perform language classification, as existing systems are all in the research lab and not commercialized. Were the proposed research to be even moderately successful, a new class of commercial offering would emerge. Possible applications include routing, monitoring, and quality assurance in call centers, data mining and intelligence applications, and to enable the proper speech recognition system. Call centers could automatically route incoming calls to appropriate CSRs, and surveillance operations could add additional filtering criteria to their intercepted records. The integration of this feature along with the original functionality of fast phonetic keyword spotting would greatly enhance data-mining capability.