Information search remains one of the best solutions today for satisfying individuals' problem-solving needs. However, we are inundated with growing quantities of information as its volume on different media grows, and so does its velocity -- the rate at which information is being generated, transmitted, and consumed. The growing importance of social media such as Twitter and blogs further exacerbates this problem. It is clear that better real-time search capabilities are needed. This project aims to advance the state of the art in information retrieval research by tackling the real-time search problem. The effort consists of two themes: the first concerns high-performance search architectures for low-latency, high-throughput query evaluation and indexing; the second concerns relevance algorithms, exploring strategies to model time-varying relevance signals in a learning-to-rank framework.
Enhanced real-time search capabilities promise to provide users more effective access to time-sensitive information. Scenarios include journalists tracking situations around the globe, victims of natural disaster trying to find loved ones, and political analysts digesting reactions to a candidate's speech. This project is expected to yield an open-source demonstration platform for real-time search on tweets and blogs. Close coordination with shared, community-wide evaluations at the NIST-sponsored Text Retrieval Conferences (TREC) further benefits the broader research community. More information is will disseminated via the project web site (www.umiacs.umd.edu/~jimmylin/projects/ ). Research results will be incorporated into class material for the large-data computing course that brings cloud computing into the classroom, and graduate students will have an opportunity to gain research and system development experience.