Producing human speech requires the exquisite timing of multiple interacting modalities: movements of the lips, tongue, velum, and vocal folds are all precisely coordinated to give each sound its unique properties. Understanding how these different modalities interact is important to basic linguistic and cognitive science as well as to applied research areas, such as automatic speech recognition, speech therapy, and second language learning. This workshop will bring together experts in speech database construction, speech recognition, and articulatory research to explore the feasibility and desirability of developing software and a database to study the interaction of the speech articulators and how their coordination relates to the sounds produced. The workshop brings these three communities together to address (i) requirements of software for extracting and analyzing articulatory data in conjunction with the acoustic signal, and (ii) properties of the database to be constructed, namely: which utterances should be recorded; how to synchronize capture of the multiple modalities; markup, annotation, and storage of the data.
The goal of the project is to develop both software for multi-modal speech analysis and a database of synchronized multi-modal speech recordings. A critical first step is to assess which features of the database and software are needed to maximize the long-term value to the scientific community. This workshop brings together researchers from diverse backgrounds in human language and computer sciences to examine these issues, so that the database and software may facilitate studies of oral tract articulation across many disciplines, e.g. to understand the diversity of human language sounds, language acquisition and endangered languages, explore speech deficits, teach foreign language pronunciation or oral language to the profoundly deaf, improve speech recognition and synthesis software, understand how musicians shape sounds while playing wind instruments, etc.
Context and description of project Human language integrates several complex systems: Sound is the most accessible of these, with measurable acoustic and articulatory properties. Understanding how sound works in language is important to basic linguistic and cognitive sciences as well as to applied research such as automatic speech recognition, language learning, correcting articulatory disorders, etc. However, the human’s use of sound in language is only marginally understood because the articulation involves multiple interacting modalities. The lips, the velum, the tongue, and the larynx all conspire to give each sound its unique properties. At this point in time, while there are tools for tracking and analyzing the movements of each articulator, there is no easy means of synchronizing and integrating data from all articulators. The goal of this project was to determine the feasibility and desirability of developing software and a database to enable the study of the interaction of these different modalities. Our vision has two components, (1) UltraPraat: to integrate articulatory analysis with Praat (www.fon.hum.uva.nl/praat/), an open source software widely used for acoustic analysis; (2) UltraSpeech: to design and build a database for accessing simultaneous articulatory and acoustic speech data, including the demographic data of speakers. The primary activity was a workshop bringing together expert language experts in articulation, speech synthesis, speech recognition, speech education and therapy. The goal of the workshop was to develop a plan to ensure that the infrastructure developed to fill the current gap in articulatory data and analysis software will provide as much benefit as possible to all the related fields in technology and the sciences. A secondary activity was to follow-up on recommendations from the workshop. The broadest impact from this project will be felt when the software and database developed and made available to other researchers. The intended users include anyone who studies oral tract articulation, e.g. to understand the diversity of human language sounds, language acquisition and endangered languages, explore speech deficits, teach foreign language pronunciation or oral language to the profoundly deaf, improve speech recognition and synthesis software, understand how musicians shape sounds while playing wind instruments, etc. Project outcomes The workshop brought together over 30 people for two days. It began with a discussion of the issues surrounding articulatory analysis, followed by a sketch of the vision for UltraPraat and UltraSpeech. Subsequently, the large group broke into smaller groups to focus on specific aspects of the two components of the envisioned tools. Four themes recurred in the various sub-groups. First, the cost of developing a dataset involving articulation is prohibitive for individual labs. However, developing the tools and corpus and making them public would allow individual labs to help populate the database with data collected in their labs, and so would enhance the overall data broadly available for research. Second, and relatedly, articulatory phonetics studies with more than 5-10 subjects are rare due to the diffculty of collecting and extracting data. The data extraction and analysis software addresses this point directly by greatly facilitating both extraction and analysis of articulatory data. Similarly, the strong recommendation to build an original database with 60 speakers, spread evenly across three dialects, provides a significant data resource for further studies. Third, a strong recommendation recurred advocating that the structure be as open as possible. As more people access the analytic tools and the database, more research questions will be asked: it is impossible to anticipate the full potential. Coupled with this was considerable discussion of ways to encourage others to add to the database, so that the value of the toolset increases multiplicatively. Finally, in the initial roll-out of software and database, it is important to include a research study showing what can be done with the resources, in order to provide a model to other researchers. The recurrent comment on this topic was that people will be more likely to recognize the value of the software and database if they are presented with results drawing on those resources. Ensuing discussion also brought out the importance of developing a prototype of UltraPraat and UltraSpeech to demonstrate the potential to other researchers, so that it is easier to understand the potential of the overall project. Subsequent efforts funded by this grant focused on developing UltraPraat. PI Archangeli secured a second NSF grant (for 3 years) to work further on development of the prototypes. Several graduate students and undergraduates have been involved in different components of the project, both in the workshop (presenting and attending) and in the subsequent software development. Thus, the students involved have had the opportunity to learn valuable lessons about conducting research. Finally, Two graduate students who began their graduate careers intending to get MA degrees are now in a PhD program. A further graduate student was inspired to apply for an NSF Doctoral Dissertation Improvement grant.