A huge amount of government information is available on the Web. The web sites containing that information are, however, often extremely complex and difficult to navigate. In this type of environment, simple queries are not very helpful in locating relevant information. The current information retrieval (IR) landscape, however, is dominated by simple queries because that is what Web search engines are good at doing. These simple queries generally help the user to find a good home page. In the case of government information on the web, however, a home page is often of little help in finding the right answers and, instead, a considerable amount of additional user effort is required. Most information needs would be better expressed as complex queries; current systems impose a bottleneck where users are forced to use keyword-based simple queries. Some types of professional searchers (e.g. intelligence analysts, paralegals) do formulate longer and more complex queries, but complex queries will only become common if systems are capable of providing good answers to those queries, and longer, grammatical questions were easier to ask. The latter issue will be eventually addressed by speech interfaces, but improving the capability of systems to handle complex queries represents the major long-term goal of IR. This award will support initial experiments with retrieval models for complex queries that go beyond the typical bag-of-words approach. There are two major issues that will be explored in the development of new retrieval models. First, in order to improve system robustness, models will be developed that more reliably capture topical relevance than our current models. . Second, in order to improve the system accuracy in the top ranked documents, models will be explored that more precisely capture topical relevance.
Intellectual merit of the proposed activity:
Answering complex queries is a hard problem, and one that has a long history of attempted solutions. There are a number of factors, however, that indicate that it should now be possible to make significant progress. In particular, there has been a recent surge of interest in a new approach to retrieval based on language models. The proposed research will leverage this recent work and study complex queries from a new perspective.
Broader impacts resulting from the proposed activity:
In the one-year time frame of this proposal, the award will support exploration of these new models to obtain preliminary results on their effectiveness with government information and on complex queries that are most representative of people with information needs related to government. This is a high-risk research project because of the lack of progress in this area in the past. The payoff of even moderate success will be high, however, as it will make the difference between a government information system returning a useful response to a query instead of either failing completely or providing very little assistance to the people seeking answers.