The system developed in this project maps World Wide Web documents to Library of Congress subject headings. It contrasts with systems which accomplish classification of Web documents through voluntary classification by the public at large. This project is designed to automatically classify Web documents into subject headings of the Science (Q) and Technology (T) portions of the Library of Congress classification system, after being trained and tested on several thousand documents previously classified by library scientists and experienced document analysts. The system is an extension of a system which employs machine-learning methods combined with natural language processing and information retrieval techniques to produce descriptive phrases for journal articles in the domain of physical chemistry. The earlier system generated more than 80% of index phrases produced by trained document analysts with expertise in the domain. The current project establishes that these methods can be scaled up to address a much larger problem domain. In doing so it also addresses issues relevant to all of natural language processing and information retrieval (e.g., efficient use of contextual cues, and reduction of errors due to encountering vocabulary which is unknown to the system). Success in the project will significantly advance concept-based classification and demonstrate its applicability to the vast problem of finding information in the largely unclassified "library" that is the World Wide Web. It will provide an electronic equivalent to the physical phenomenon of library patrons going to those shelves which correspond to their favorite or most-used call numbers. www.cs.msstate.edu/artificial_intelligence/kudzu.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9734807
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1998-09-01
Budget End
2002-08-31
Support Year
Fiscal Year
1997
Total Cost
$278,994
Indirect Cost
Name
Mississippi State University
Department
Type
DUNS #
City
Mississippi State
State
MS
Country
United States
Zip Code
39762