One of the key problems facing us in the 21st century is information retrieval and management. Finding ways to automatically index, label, and access multimedia content (such as music documents) in meaningful ways is an open research question that increases in importance as multimedia databases proliferate and grow. Music collections, such as the 3.5 million recordings in Apple Computer's iTunes repository, comprise one of the most popular categories of on-line multimedia content. For scholars, musicians and even casual listeners, the music document is only the beginning, a tool to initiate the task at hand. Musicians may be interested in remixing a musical recording even though all they have available is the final mix. Scholars may wish to analyze the harmonies in a piece. Others may want karaoke that follows the singer's expressive timing, or a way to remove the sound of an unwanted cell phone ring from a recording of their daughter's flute recital. The objective of this research is to develop two key facilitating technologies to enable these kinds of interactions: score alignment and source separation. Score alignment, involves aligning an audio performance and to the events in a machine-readable music score. When aligned to a score, a performance can be addressed by melodic and harmonic content. We propose to advance the state-of-the-art by enabling a machine to follow partially specified scores (such as Jazz lead sheets). This alignment require significant inference about likely surface structures (the note sequence in an improvised solo) from deeper structural descriptions in the score (the chords in a lead sheet). This will enable alignment of entire classes of music, such as much Jazz, Pop and Rock, that cannot currently be aligned to scores. The second technology, source separation, is the process of isolating individual source signals, given mixtures of the source signals. With source separation, individual instruments and sounds can be accessed, identified and manipulated in ways beyond the power of commercial audio search and editing software. We will advance the field through score-informed separation, as well as new iterative methods for approximating source models from acoustic mixtures. The idea is to develop a synergistic system for music-information-retrieval and interaction that uses multiple document modalities (written scores, audio files, MIDI) to infer more about the music structure than is possible using a single modality. This research will impact the signal-processing community (source separation), the music information retrieval community (music indexing and search) and the artificial intelligence community (tools for intelligent abstraction of real-world data). To broadly disseminate the work, demonstration tools will be made available over the internet and results will be published in relevant journals and conferences. The PI is committed to involving undergraduates and members of historically underrepresented groups in research, working with the SROP and UROP programs to make this happen. The PI also teaches the course "Machine Perception of Music" where research results will be disseminated to a wide variety of students.

Project Report

One of the key problems facing us in the 21st century is information retrieval and management. Finding ways to automatically index, label, and access multimedia content (such as music documents) in meaningful ways is an open research question that increases in importance as multimedia databases proliferate and grow. Music collections, such as the 3.5 million recordings in Apple Computer’s iTunes repository, comprise one of the most popular categories of on-line multimedia content. For scholars, musicians and even casual listeners, the music document is only the beginning, a tool to initiate the task at hand. Musicians may be interested in remixing a musical recording even though all they have available is the final mix. Scholars may wish to analyze the harmonies in a piece. Others may want karaoke that follows the singer’s expressive timing, or a way to remove the sound of an unwanted cell phone ring from a recording of their daughter’s flute recital. The objective of this research was to develop two key facilitating technologies to enable these kinds of interactions: score alignment and source separation. Score alignment, involves aligning an audio performance and to the events in a machine-readable music score. When aligned to a score, a performance can be addressed by melodic and harmonic content. We proposed to advance the state-of-the-art by enabling a machine to follow partially specified scores (such as Jazz lead sheets). This alignment requires significant inference about likely surface structures (the note sequence in an improvised solo) from deeper structural descriptions in the score (the chords in a lead sheet). This enables alignment of entire classes of music, such as much Jazz, Pop and Rock, that could not be aligned to scores prior to our work. We developed new fundamental algorithms to allow such alignment, enabling new kinds of editing and remixing of audio that were not previously possible for music that have a loosely specified lead sheet, but not a note-for-note score, as is found in classical music. The second technology, source separation, is the process of isolating individual source signals, given mixtures of the source signals. With source separation, individual instruments and other sounds (such as human voices) can be accessed, identified and manipulated in ways beyond the power of commercial audio search and editing software. We advanced the field both through music score-informed separation, as well as repetition-based source separation. This work has also resulted in a patent application for repetition-based source separation. We further anticipate the source separation algorithms we use can, in the future, be applied to non-music-based problems, such as enhancement of audio for hearing aids. One goal of this work was to make steps towards a synergistic system for music-information-retrieval and interaction that uses multiple document modalities (written scores, audio files, MIDI) to infer more about the music structure than is possible using a single modality. We have developed a demonstration tool that separates a mixture of music (such as a string quartet) into individual tracks by aligning the audio to a MIDI file containing the written score. This allows remixing of the audio even when individually recorded tracks do not exist (e.g. live recording of a string quartet to a stereo microphone). This research impacts the signal-processing community (source separation), the music information retrieval community (music indexing and search) and the artificial intelligence community (tools for intelligent abstraction of real-world data). To broadly disseminate the work, demonstration tools for source separation, score alignment and audio manipulation are available over the internet at music.cs.northwestern.edu and results have been published in relevant journals and conferences.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0643752
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2007-01-01
Budget End
2012-12-31
Support Year
Fiscal Year
2006
Total Cost
$506,669
Indirect Cost
Name
Northwestern University at Chicago
Department
Type
DUNS #
City
Evanston
State
IL
Country
United States
Zip Code
60201