Contemporary experimental biologists have access to a large body of disparate kinds of information about many different organisms and biological systems. Although much of the information is still in papers published in journals, primary data increasingly reside in electronically accessible databases. The volume and complexity of the available data exceed the synthetic and reasoning capacities of any individual researcher, stimulating the development of knowledge bases, which represent information about biological systems at a higher level of abstraction than do databases. A serious limitation on the usefulness of existing knowledge bases is that knowledge about the system is effectively "frozen" as it is archived. Observations that contradict the bulk of available evidence are generally omitted at the time of annotation. However, a deeper understanding of how biological systems operate often begins with an observation that contradicts existing knowledge.
The overall objective of this project is to create computational tools that allow the experimental biologist to explore accumulated information about biological systems without precluding access to contradictory information. The approach differs markedly from those currently used to construct knowledge bases, such as the pathway databases Reactome (www.reactome.org) and BioCyc (www.biocyc.org), which store currently accepted models based on expert input and literature information and they must be revised and rewritten as the knowledge grows. The approach being taken here is to instead store the ingredients for model building at the evidence level and enable biologists to continuously and actively participate in model- building through the familiar device of formulating and testing hypotheses. The hypotheses themselves are used to query the stored data by breaking each hypothesis down into its constituent relationships and extracting all of the explicit and implicit assertions that must hold in order for the hypothesis to be valid. A set of evaluation rules is applied to test these assertions for agreement with different types of data and present the user with links to information and data that support the hypothesis, as well as links to those that contradict it. Thus, static conclusions are not archived, but instead the data required are stored to elucidate relationships that exist in the system. Although the experimenter's ideas about relationships are tested against what is known, conclusions are not imposed. Rather, both supporting information and contradictions are reported and the task of evaluating the weight and significance of each left to the experimenter.
This work has several broader impacts. The importance of this approach is that it is a "thinking" tool for the experimental biologist, as opposed to a knowledge base, which is essentially an electronic textbook. Few computer assisted thinking tools are available at present and the success of this project could revolutionize how experimental biologists work. The tools will be integrated into the operation of The Arabidopsis Information Resource (TAIR: www.arabidopsis.org), a community database with 14,000 regular users world-wide.