Biomedical research laboratories still operate much like isolated silos, producing, processing and storing data with only periodic and incomplete knowledge exports in the form of publications. Supplemental material might include data dumps and algorithms, though often in formats that is not amenable to exchange. At the other extreme, we have massive biological data sets being collected and curated in online databases. These "big data" resources suffer from their own issues involving formats, inadequate annotation and technical barriers that diminish accessibility. The WikiPathways project exists at the intersection of individual, domain-specific knowledge retained by researchers and big data repositories of partially overlapping facts, annotations and references. The goal of WikiPathways is to capture knowledge about biological pathways (the entities, interactions and layout) in a form that is both human readable and amenable to computational analysis. Thus, we complete a cycle from researcher knowledge that when synthesized with standardized data, leads to novel pathways models that can be used to analyze orthogonal data sets, leading to new insights, experiments and knowledge. We have just begun to realize the potential of our web-based community curation system. In order to support the projected growth, we need to make significant investments in our infrastructure to support users, contributors and curators. The following challenges provide the motivation and significance for the proposed aims: (1) representing pathway content in a form that is maximally reusable, and (2) distilling observations from primary experiments (low- and high-throughput) into frameworks of understanding that include prior knowledge, i.e., pathways. So, we propose new infrastructure to make pathway content more accessible and connected. We will do this through a measured approach utilizing semantic technologies where appropriate to take advantage of semantic tools for advanced search, data integration and bioinformatics analysis. We also propose to utilize more structured data to enhance pathway information displays with interactive themes and to support critical tools for pathway curation. Finally, we will create a set of guidelines and materials for pathway curators to collaborate with researchers, capturing pathway content in active researcher areas at the source. Instead of relying solely on centralized, internally curated resources for information culled from published articles and periodic submissions, we want to enable researchers to play a direct role in annotating, correcting and promoting representations of their work.

Public Health Relevance

This research will develop new infrastructure for data mining, integration and visualization, as well as collaboration tools to facilitate the capture and exchange of biological pathway data and to catalyze new discovery. Biological pathways are used in all areas of biomedical research, from cardiovascular disease to neurodegenerative diseases to immunology to cancer. By engaging researchers in the curation and exchange of pathway data, our work will impact the process and progress of research in all of these areas.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM100039-03
Application #
8668099
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
2012-09-01
Project End
2017-05-31
Budget Start
2014-06-01
Budget End
2015-05-31
Support Year
3
Fiscal Year
2014
Total Cost
$368,384
Indirect Cost
$168,384
Name
J. David Gladstone Institutes
Department
Type
DUNS #
099992430
City
San Francisco
State
CA
Country
United States
Zip Code
94158