The Integrated Digital Event Archive and Library (IDEAL) system addresses the need for combining the best of digital library and archive technologies in support of stakeholders who are remembering and/or studying important events. It extends the work at Virginia Tech on the Crisis, Tragedy, and Recovery network (see to handle government and community events, in addition to a range of significant natural or manmade disasters. It addresses needs of those interested in emergency preparedness/response, digital government, and the social sciences. It proves the effectiveness of the 5S (Societies, Scenarios, Spaces, Structures, Streams) approach to intelligent information systems by crawling and archiving events of broad interest. It leverages and extends the capabilities of the Internet Archive to develop spontaneous event collections that can be permanently archived as well as searched and accessed, and of the LucidWorks Big Data software that supports scalable indexing, analyzing, and accessing of very large collections. Through a new model-based approach to intelligent focused crawling, it improves the quality (e.g., accuracy, coverage, and elimination of noise) of collections of webpages so as to ensure comprehensiveness, balance, and low bias, as is needed for scholarly study of historically important events by social scientists. It incorporates a range of visualization capabilities in support of key stakeholder communities, including archivists, librarians, researchers, scholars, and the general public. IDEAL connects the processing of tweets and webpages, combining informal and formal media, to automatically detect important events, as well as to support building collections on chosen general or specific topics. It supports integration of multiple types and at multiple levels, including key models about the event it is crawling (event models), the sources of information about the event (source models), the mechanisms used for disseminating information about the event (publishing venue models), and the entities related to the event (society /organization models). Integrated services include topic identification, categorization (building upon special ontologies being devised), sentiment analysis, and visualization of data, information, and context.

The IDEAL website ( supports searching, browsing, analyzing, and visualizing of event collections (of both tweets and webpages), as well as access to project software, methods, findings, publications, and other results. Usage is encouraged of the integrated system along with a growing number of collections, as well as of particular tools such as for focused crawling, which should aid curators to avoid non-relevant content while including a broader range of sources, improving significantly upon current crawling and archiving methods. Important data and information on events of interest are saved rather than lost, helping preserve our history and culture, in support of public interest, education, policy making, historical analyses, and comparative studies. Students studying sociology, human-computer interaction, digital libraries, information retrieval, computational linguistics, multimedia, and hypertext are gaining experience and contributing in scholarly studies, algorithms, software, interfaces, and big data handling.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Application #
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
United States
Zip Code