This goal of this research project is to investigate issues in the design and development of CS-BibCube, a multidimensional text data cube, constructed based on multidimensional categorical dimensions (e.g., author list, venue, and date) and unstructured text attributes (e.g., title, abstract, and contents), to facilitate multidimensional online analytical processing (OLAP) and mining of computer science literature. Data cube has become an essential engine in data warehouse industry and has been extended to handle relatively structured non-relational data, including spatio-temporal data, sequences, graphs, data streams, etc. However, it is still challenging to handle unstructured text data. This project is to explore and evaluate the possibilities and alternatives on the design, multidimensional modeling, implementation, performance improvement, and deployment of text-cubing and text-OLAP. The work will integrate multiple disciplinary approaches derived from data cube and OLAP, information retrieval, text mining, and machine learning, and further study is expected to be expanded to other multidimensional text databases with broad applications in business, industry, government agencies, scientific research, and education.
The research results are to be published in research forums on information retrieval, data mining, and database systems, and be integrated into the educational program at the University of Illinois at Urbana-Champaign. The progress of the project and the research results will be disseminated via the project Web site (www.cs.uiuc.edu/~hanj/projs/csbibcube.htm).