This collaborative research project (IIS-1160894, W. Bruce Croft, University of Massachusetts Amherst and IIS-1160862, Jamie Callan, Carnegie-Mellon University) addresses the complex issues of ephemeral information that is generated as part of social interactions is different in terms of time scale, quantity, and quality to archival information found on the web. This project investigates the hypothesis that, because of the context provided, searching either ephemeral or archival information is enhanced using the connections between them. It develops new retrieval models and features for ranking functions in a range of search tasks that can exploit an integrated ephemeral/archival network. Some search tasks are based on previous TREC blog, microblog, and web activities. It also investigates two new tasks, conversation retrieval and aggregated social search. Conversation retrieval targets information units in the form of "conversations" or "events" instead of simply retrieving social postings or web pages. Aggregated social search ranks information in different granularities, such as sentence, posting, conversation, or thread, based on the underlying query intent.

Research that explores the connections between ephemeral and archival information requires a dataset that contains both types of information. A crucial part of this project extends the archival ClueWeb12 dataset with ephemeral microblog, blog, and discussion forum data that links to the web data. This extension is distributed to the research community as the ClueWeb12++ dataset. This project (http://ciir.cs.umass.edu/research/ephemeral/) is the first to address the full possibilities of search that exploits all the connections and contexts created by bringing together the two "worlds" of information. It also develops and distributes a unique new dataset that supports the development of a new generation of tools to access a broad range of information. Students at collaborating institutions, University of Massachusetts Amherst and Carnegie-Mellon University will be involved in educational activities and benefit from research experience.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1160894
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2012-08-01
Budget End
2018-07-31
Support Year
Fiscal Year
2011
Total Cost
$663,501
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Hadley
State
MA
Country
United States
Zip Code
01035