The goal of this EAGER project is to support exploratory work on a new class of cascade classifiers and hybrid classifiers for automatic indexing of polyphonic music according to instruments and types of instruments. Testing new classifiers for automatic indexing of polyphonic music, specifically those for the automatic classification of instrumental sound from recordings of orchestral music is difficult and involves a high degree of risk and uncertainty as to the outcome. If successful, the results may prove to be transformative and have significant impact on music information analysis. The work will employ resources in the MIRAI database developed in an earlier NSF supported project. The main MIRAI database contains about 1,000,000 musical instrument sounds, each represented as a vector of approximately 1,000 features. Each instrument sound is identified and matched to a corresponding instrument.
Huge repositories of audio recordings available from the Internet and private sets offer plethora of options for potential listeners. The listeners might be interested in finding particular titles, but they can also wish to find pieces they are unable to name. For example, the user might be in mood to listen to something joyful, romantic, or nostalgic; he or she may want to find a tune sung to the computer's microphone; also, the user might be in mood to listen to jazz with solo trumpet, or classic music with sweet violin sound. More advanced person (a musician) might need scores for the piece of music found in the Internet, to play it by himself or herself. All these issues are of interest for researchers working in MIR domain, since meta-information enclosed in audio files lacks such data -- usually recordings are labeled by title and performer, maybe category and playing time. However, automatic categorization of music pieces is still one of more often performed tasks, since the user may need more information than it is already provided, i.e. more detailed or different categorization. Automatic extraction of melody or possibly the full score is another aim of MIR. Pitch-tracking techniques yield quite good results for monophonic data, but extraction of polyphonic data is much more complicated. When multiple instruments play, information about timbre may help to separate melodic lines for automatic transcription of music (spatial information might also be used here). In this proposal, we mainly focused on automatic recognition of timbre, i.e. of instrument, playing in polyphonic and polytimbral (multi-instrumental) audio recordings. We introduced and tested the hierarchically structured cascade classification system to estimate multiple timbre information from the polyphonic sound by classification based on acoustic features and short-term power spectrum matching. This cascade classification system makes a first estimate on the higher level decision attribute, which stands for the musical instrument family. Then, the further estimation is done within that specific family range. Our experiments showed better performance of a cascade system than traditional hierarchical system (the same type of classifier is used at all nodes of the tree) or traditional flat classification methods which directly estimate the instruments without higher level of family information analysis. We introduced and tested the hierarchically structured cascade classification system to estimate multiple timbre information from the polyphonic sound by classification based on acoustic features and short-term power spectrum matching. For each window frame, the cascade process starts by the classification at the root of hierarchical tree and it is followed by the classification at other lower levels of the tree. The system selects the appropriate classifier and feature set to perform classification at each possible level from the top to the bottom. The confidence of classification at each level has to be either equal or above some user specified confidence threshold. After classification process reaches the bottom level, which is the instrument level, we have the final instrument estimations for the window frame, and the overall confidence for each instrument estimation is calculated by multiplying the confidence obtained at each node. After all the individual frames are estimated by the classification system, a smoothing process is performed by calculating the average confidence of each possible instrument within the indexing window. Our experiments showed better performance of a cascade system than traditional hierarchical system (the same type of classifier is used at all nodes of the tree) or traditional flat classification methods which directly estimate the instruments without higher level of family information analysis.