The CrystalGrid Framework (CGF) project will research the acquisition, transport, and curation of data over the entire data space of the field of X-ray crystallography, addressing methods for managing wide heterogeneity in data representations, formats, data containers, administrative domains and diverse instruments and equipment. Until recently, individual labs have simply imposed local homogeneity of format and procedure, and not stored lab-dependent metadata. This ad hoc system is limited, however, as crystallographers begin to cross between labs to accomplish their research objectives, and as increasing numbers and sizes of output data streams leave less time for each investigation. Local workflow must be made explicit, procedures must be formally described, and the history and assemblages of data expressed in an open, shareable way. Creation and management of complete, accessible records for each experiment is critical, as well as heterogeneity in data acquisition and management across the field.

To meet that need, this project will develop a framework of web service interfaces and data and metadata systems addressing the whole spectrum of crystallography. Project participants and collaborators will leverage existing projects, such as Reciprocal Net and Common Instrument Middleware Architecture, that address narrower issues in the problem domain. The CGF will also draw on collaborating projects with overlapping areas of interest, such as the UK-based Comb-e-Chem project. The resulting framework will be a useful environment for crystallographic investigations and an extensible platform on which new web-based applications can be built.

The CGF project involves the classic problem of dealing with heterogeneity in data, procedures, and instruments in the crystallography application space, and another classic problem in integrating the entire data collection, transport, and curation requirements of the domain into a seamless beginning to end system. The challenge is to create a virtualization system that manages heterogeneity in more than a single aspect and to provide vertical integration using only open, extensible, and interoperable standards and methodologies.

While the project constitutes research into pertinent computer science problems, the plan for performing the research is centered on producing a product (the CGF) that will immediately be useful in addressing emerging technical problems in the field of X-ray crystallography. Within crystallography, one of the specific goals is to make structural results accessible that might otherwise never be seen, and so the CGF will help increase the body of scientific knowledge and improve the return on federal investment in the large numbers of x-ray diffractometers and associated instruments nationwide. Although the project targets specifically a few hundreds of crystallography labs worldwide, the software and methods created in it are intended to be reusable for any science moving from individual lab practices to a shared, global collaboratory system. In sciences such as high-energy physics and astronomy, the scientists have long shared single, unique, large instruments and had to create shared data management and instrument metadata. CGF is likely to be useful in other scientific disciplines which still use widely-distributed lab-based instruments that now need to be linked in data grids.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0513768
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2005-09-01
Budget End
2010-10-31
Support Year
Fiscal Year
2005
Total Cost
$426,689
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401