This SBIR Phase II project applies data mining and machine learning techniques to both natural language description and Internet link graphs to model communities in order to predict preference, taste and sentiment for different kinds of media (music, TV, online media, video games, books). Current contextual information mining approaches that scan the text on a page for advertisement or recommendation ignore valuable community connections inherent in most self-published Internet discussion. Sentiment and opinion extraction systems operating on full text create challenging language parsing problems are fraught with issues of scale and adaptability. The identification systems can automatically categorize anonymous Internet writers or website visitors into specific demographic communities based on their tastes in many kinds of media. The Phase II research project approaches opinion extraction with a bias-free learning model based on training from known online corpuses that can be adapted to different languages and learns in real time as more data becomes available for high accuracy.
Current personalization and marketing approaches either look at the "clickstream" of an anonymous user, leading to equally anonymous recommendations for popular movies and music -- or by scanning a surface-level overview of the text, leading to keyword advertisements with limited contextual understanding of entertainment content and community sentiment. The project plans to fully integrate people-focused community and sentiment analysis technologies into an autonomous, learning and scale-free "media knowledge service" for digital entertainment providers and marketers that can change the way digital content is marketed and sold.
The Echo Nest is the world’s leading music intelligence development platform with trillions of data points on 35 million songs and 2.5 million artists. Its web Application Programming Interfaces (APIs) power music discovery for some of the most popular digital media services and hundreds of independent applications. While founded on the basis of automatic content (key, tempo, pitch, timbre) and culture (news, blogs, reviews, tweets) analysis, this SBIR research project allowed The Echo Nest’s platform to expand its knowledge to the listener. The overall goal is to deliver dynamic user profiling to the digital music market. The application can accurately predict an anonymous user’s demographic (age, gender, location) and psychographic (preferences, interests, lifestyle) profile, based solely on the user’s music taste. We have been able to demonstrate compelling predictive results in a number of audience categories for commercial customers, beating other state-of-the-art demographic and taste prediction approaches. For example, our "taste profile" product was able to predict listeners’ political affiliation just from their music collection. Much of the project was centered around the scalability aspect. Our novel machine learning approach can now predict correlations amongst hundreds of thousands of preference variables for millions of users, in a real-time, online software-as-a-service (SaaS) platform. This technology is useful for the field at large. The product, currently in its final phase of commercial evaluation could dramatically change the digital music business, from an exploding consumer phenomenon in search of a new business model to a robust, profitable online industry.