Research will be carried out aimed at improving usability of the Electronic Preprint Archive, a web-based, interactive repository of over 250,000 full text articles in physics and related disciplines used heavily by these communities, and, increasingly, serving as a model for such systems and libraries in other fields of science and engineering in the U.S. and abroad. For this purpose, the structure of the archive will be explored using document content data, citation tree data, and usage data. Automatic text classification and document clustering techniques will be developed and used for cleaning, building, and maintaining the subject classification structure. Usage data logs will be used to mine for functional improvements to the Archive. Burst analysis of words in document titles will be used to map trends and, together with citation data, to visualize and navigate the literature. This work will provide new tools useful for continued development of open access systems starting to be used as research libraries in a number of fields, including medicine.

Agency
National Science Foundation (NSF)
Institute
Division of Physics (PHY)
Application #
0404553
Program Officer
Kathleen V. McCloud
Project Start
Project End
Budget Start
2004-10-01
Budget End
2009-09-30
Support Year
Fiscal Year
2004
Total Cost
$796,395
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850