Many disciplines of modern biology have undergone a revolution in data acquisition. With the advent of high throughput technologies, data is accumulating at a pace that outstrips our ability to convert that data into knowledge. Application of these technologies can provide terabyte amounts of data relevant to a particular biological problem but interpreting that volume of information remains a challenge. A variety of resources are available to help researchers visualize, categorize and ultimately make sense out of their data. Visualization tools such as those in KEGG or Reactome, place data in the context of signaling and metabolic pathways. Many different ontologies, text mining and enrichment analysis tools have been developed to help categorize individual data points into groups. Both visualization and categorization reduce the complexity of the problem and provide insight into the underlying biology. Ultimately, however, people are still in need for the essential steps of integrating, evaluating and, finally, converting these data to human knowledge. What is needed a novel, dynamic approach to pathway visualization along with integrating disparate ontologies and information found in text to improve the researcher?s ability to convert high throughput data into understanding. This will be achieved by developing PathBubbles, a dynamic, interactive pathway visualization tool using the existing Vis- and Code Bubbles as a framework. In addition additional information will be provided by integrating data found in specific ontologies, text-mining tools and expression data to provide gene annotation for use with PathBubbles. Finally, capturing functional information about post-translationally modified proteins from literature and integrating this information into PathBubbles, will assist users in developing testable hypotheses.
Humans are visual animals, relying on visual input to sense and orient themselves to the environment. One consequence of this is that humans are very able to recognize patterns in visually displayed information. This work exploits this ability to help biologists analyze thousands of pathway data points by developing a novel web based interface where information is displayed as a graph. This graph will display data from gene studies where each gene product is shown as a dot and the connections between the genes are lines. The dots that represent gene products can be colored depending on the activity of the gene in a particular biological condition. For example, if the gene is expressed at a very high level in a cancer cell compared to a normal cell, the dot will be displayed in red. In addition, the lines may represent a variety of interactions such as binding between gene products or sharing of a small molecule and the type of interaction can be indicated by different line colors. The graphical interface is supported by an extensive database of information about each gene product and each interaction. Users will be able to access that information by simply clicking on the dot or line of interest. A particularly novel aspect of this project is that users will be able to add their own data by using an interface that allows them to create new dots (gene products) and lines (interactions). They will then be able to provide functional information about what happens, for instance, when their gene product interacts with a pre-existing gene product already in the database. Based on this new information, the system will then predict the effect of the user's new gene product on the biological pathways. This will allow users to ask 'what if' questions, using this interface to test hypotheses before doing actual experiments. While the system is being developed in the context of biology, the ability to graphically test different hypotheses will have application to a variety of other disciplines including chemistry, engineering, physics and computer sciences.