Vast quantities of character-encoded text form the foundation for the Information Retrieval revolution of recent decades. In contrast, very little symbolically-represented music exists, preventing music from fully participating in the Information Age. The International Music Score Library Project (IMSLP) is a large and rapidly growing open library of public domain machine-printed classical music scores, actively used by many musicians, music scholars, and researchers around the world. The PI's objective in this exploratory project is to begin creating the algorithms needed for automatic optical music recognition (OMR) to convert scanned musical scores into symbolic representations. This is the first step in the PI's long-term goal of building a system to transform the library's score images into symbolic representations, bringing the IMSLP community together in a Wikipedia-like collaboration to mine this invaluable resource. The result will be a comprehensive and open music library of symbolic scores allowing unprecedented access while forming the basis of many new music applications as well as commercial potential.
While there have been many efforts at OMR over the last fifty years, the current state of the art is far below what the IMSLP challenge requires. Existing systems require such laborious error correction that it is not clear if they actually improve on manual data entry. OMR is much harder than its optical character recognition (OCR) counterpart due to the fundamentally two-dimensional layout of music, which makes it difficult to cast the problem in terms of familiar recognition paradigms. Thus the primary focus of the current work lies in creating and implementing an appropriate recognition paradigm suitable for the two-dimensional structure of music notation, building in knowledge of notation conventions and allowing for automatic adaptation. To this end, the PI will explore strategies that merge recognition with segmentation, express the desire for non-overlapping musical symbols in a principled way, and model the interpretation problem which understands the often ambiguous meaning of the recognized symbols.
Broader Impacts: A large open music library of symbolic scores, as well as offering widespread and easy access to one of our culture's richest traditions, will be transformative for musicians, music researchers, and music lovers. Libraries will be able to distribute music scores electronically and globally, allowing for adaptive display, ongoing scholarly annotation, and registration of scores with video and audio. The prevalence of symbolic music scores will open up a world of possibilities to music-science researchers, including systems for music information retrieval, expressive performance, automatic musical accompaniment, transcription and arranging, performance assistance, and many others. Finally, the library will enable important commercial applications. For example, many expect that devices such as the iPad will form the sheet music "delivery systems" of the future; the benefits of basing such digital music readers on symbolically represented music are compelling, including automatic page turning and content-based annotation.
Under our NSF EAGER grant we have developed the Ceres system, a software tool for optical music recognition. Ceres is named after the Roman goddess of the harvest, as our goal is to harvest large scale libraries of symbolic music data. Symbolic music representations will be the basis for music libraries having far greater reach and power than traditional libraries, allowing unlimited access to public domain holdings, flexible display, parts from scores, and alignment between scores and audio. Symbolic music data will also be the basis for digital music stands, whose ability to immediately access, transpose, turn pages, and provide performance feedback will inevitably lead to their wide adoption and commercial success. Such music data will support the fledgling field of computational musicology, while allowing a wide range of music informatics applications, including musical accompaniment systems and corpus-based composition. In essence, these data will be the key to ushering music into the 21st century, alongside text. Optical music recognition (OMR), analogous to its familiar character recognition cousin, promises to automatically create the needed symbolic music representations from score images. While efforts in this field date back to the 1960s, the current state of the art is not sufficiently developed to support the creation of large-scale libraries. At present the best systems are commercial, and thus lack the benefit of the open source model we pursue. With the rise of the International Music Score Library Project (IMLSP), an open library containing nearly 300,000 scanned scores images, we acutely feel the need for high quality OMR, now more than ever. The holdings of the IMSLP provide the data for our large-scale OMR effort, as well as a community of users to support the process. The heart of our system is its ability to automatically recognize the contents of scanned music scores (digital images). This includes the overall hierarchical understanding of the music page in terms of systems, staves, measures, as well as the contents of these measures in terms of notes, chords, beamed groups, accidentals, clefs, slurs, dynamics, etc. During the period of this grant we have made significant progress in our core recognition ability, increasing its accuracy, expanding the range of symbols we identify, and allowing the system to automatically improve by adapting to the data at hand. However, we have come to see that the definitive music data we seek cannot be the product of recognition alone. Music contains a heavy tail of unusual symbols and notational conventions that have accumulated over several centuries of unconstrained use. It is simply not practical to create a recognizer that covers all of thesesituations, since, in doing so, we inevitably sacrifice performance on the simpler and more common scenarios for the sake of rarer and rarer instances. Thus, during the granting period, we have extended our view of the overall task to include human input in the correction and completion of the recognized results. In this phase a person uses a fully-implemented drag and drop interface to delete and add symbols until the image ink is covered with appropriately labeled primitives. We envision a large and international collection of people using our interface to develop high-quality, definitive music encodings, while supplying the human imprimatur that deems the data worthy of trust. Interactive recognition tools will extend the reach and power of this human-directed phase in later work. During the granting period we have made significant strides in both our core recognition abilities and in implementing an interface that successfully incorporates the necessary human input, thus allowing the system to be applied in a real-world setting. At present our system is about 50,000 lines of c code, growing every day. Due to the NSF support our system is competitive with the best commercial systems, though we anticipate moving well beyond our current level of success. There is still work to be done before we will be ready to unleash our system on the world, though we are committed to carrying this work forward to an end that makes real-world impact.