Music is part of every culture on earth, and the enjoyment of music is nearly universal. Music performance is often highly collaborative; musicians harmonize their pitch, coordinate their timing, and reinforce their expressiveness to make music that strikes the hearts of the audience. This research envisions a human-computer collaborative music making system that allows people to collaborate with machines in a manner similar to that in which we collaborate with each other. This is of great significance, as we live in a world where the interaction between humans and machines is becoming deeper and broader, so developing systems that allow us to collaborate with machines is a primary goal of research into cyber-human systems, robotics, and artificial intelligence. Project outcomes will advance the state of the art in automated accompaniment systems by empowering machines with much stronger music perception skills (audio-visual attending to individual parts in ensemble performances vs. monophonic listening), much more expressive music performance skills (expressive audio-visual rendering vs. timing adaptation of audio only), and much deeper understanding of music theory and composition rules (composition and improvisation skills vs. music theory novice). This project will showcase the powerful connection between music and technology, which has inspired generations of great multidisciplinary thinkers such as Pythagoras, Galilei, Da Vinci, and Franklin. The techniques developed in this project will be applied to augmented concert experiences through collaborations with the Eastman School of Music and the Chinese Choral Society of Rochester. Outreach to pre-college and college students will be accomplished through a variety of activities, including lab visits, a summer mini-course on "music and math" and teaching and advising in the unique and interdisciplinary Audio and Music Engineering program at the University of Rochester.
The project has four research thrusts with the following expected outcomes: 1) Attending to Human Performances: algorithms for machine listening and visual analysis of multi-instrument polyphonic music performances; 2) Rendering Expressive Machine Performances: computational models for expressiveness and audio-visual rendering techniques for expressive performances; 3) Modeling Music Language for Improvisation: computational models for compositional rules, and algorithms for music generation, harmonization, and improvisation; 4) System Integration: a human-computer collaborative music making system, and a set of design principles backed by subjective evaluations. The research will advance existing interaction mechanisms toward human-computer collaboration. It will also advance the current static-object-displaying type of augmented reality to more intelligent, dynamic and collaborative augmented reality in music performances. The research on audio-visual analysis will advanes both machine listening and visual understanding of audio-visual scenes in the music context. The research on visual rendering of expressive performances will open a new field of computational modeling of visual expressiveness in musical performances. And the research on computational music language models is fundamental for many tasks in music informatics, including transcription, composition, and retrieval. The integration of analysis, performance and music language modeling towards a real-time collaborative system represents a new level of intelligent real-time computing.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.