This project will develop a suite of tools and services to encourage formation of virtual organizations in scientific communities of various sizes, such as conference groups and departmental research groups, and allow such organizations to filter out relevant documents from various input streams, select and enhance the quality of bibliographic data associated with the organization, and attract researchers to contribute to activity of the organization. Methods of bibliometric analysis, machine learning and statistical visualization will be applied to assist the exploration and understanding of bibliographic collections of various sizes, for example all work produced by a research group, or published in a journal, or all work in a field. This will provide an interactive environment which allows the researcher to move beyond static summaries to dynamically explore the environment in which an article of interest exists. In particular, methods of machine-learning will be applied to to build an article recommendation service, based on collaborative filtering and on semantic analysis of bibliographic data, initially for researchers in probability and statistics. Research will also be done to provide adequate authoring tools for authors in mathematical fields to easily create highly structured, machine-readable documents in latex, bibtex and or similar formats, which can then be easily aggregated and interlinked in encyclopedic compilations, and then subjected to machine-learning and statistical analysis to provide high-level overviews of the landscape of these fields. In statistics, mathematics and related fields, including social science, we expect the networks of information about authors, publications, problems and datasets that will be created and exposed through this project should advance these fields by revealing hidden connections among different sub-disciplines, and accelerating the transmission of knowledge across these sub-disciplines. With respect to information science, the project should advance understanding of the collaborative production and enhancement of bibliographic information online, leveraging flexible similarity metrics presented in a visually stimulating way to draw interest and encourage the researcher to expand their search parameters.
This proposal addresses three fundamental problems of knowledge management: the compartmentalization problem (how to break down barriers which separate disciplines), the navigation problem (how to guide students and researchers within and between disciplines), and the maintenance problem (how to provide incentives for individuals and organizations to improve the quality of publicly accessible knowledge). It is proposed to solve these problems by gradually distilling the wealth of heterogeneous data now available in digital formats into an openly navigable network of websites, the Bibliographic Knowledge Network (BKN), each node of which is a website dedicated to a specific topic or field of knowledge. Each participating site will typically be designed as a guide for researchers, teachers, and students in a particular field of knowledge, and maintained by a Virtual Organization with a commitment to that field. The BKN will be created through the development of software which makes it easy for a large collection of mostly small and distributed organizations to brand, select, maintain, and annotate collections of structured scientific content. That content will be made available in machine-readable formats, to allow connections between ideas in different disciplines to be made using methods of machine learning. Methods of machine learning will be applied to provide article recommendation services based on both collaborative filtering and semantic analysis of documents. The collective knowledge system emerging from this project will be available beyond the walls of academia, and provide well-organized high quality information to anyone with an Internet connection. The expository components of the system will attract people from all backgrounds to pursue scientific careers, and will allow students at all levels to encounter materials which will lead them to higher levels. The system will add great value to other Open Access initiatives, including the system of interoperable digital repositories, Wikipedia, Open Journal Systems, and free academic search services.