The biomedical research enterprise is incredibly productive, generating new knowledge at an unprecedented pace. However, as a community, we do a relatively poor job organizing and managing that knowledge so that it is maximally useful for the design and interpretation of other experiments. Scientific research is most efficient when new hypotheses are informed by the totality of past findings, and that scientific knowledge is Findable, Accessible, Interoperable, and Reusable (FAIR). Unfortunately the vast majority of research is published only in free-text, unstructured journal articles, rendering the findings very difficult to integrate and compute upon. This proposal describes the use of crowdsourcing to address this challenge in biomedical knowledge management. It specifically proposes to leverage Wikidata, which has the goal of creating a comprehensive knowledge base that both humans and computers can both read and edit. Wikidata is run by the same organization that runs Wikipedia, and like its sister project, it employs the principle of crowdsourcing to tackle a grand challenge in information management. Both Wikipedia and Wikidata invite and empower the community at large to collaboratively add, edit, and refine content. In this proposal, we continue our work to create the world's largest open and FAIR knowledge base of biomedical information within Wikidata. This proposal include three Specific Aims. First, we will improve both the quantity and quality of biomedical information in Wikidata. Quantity will be increased by loading several key biomedical vocabularies and ontologies, and data quality will be made more rigorous by the introduction of formal and computable data models. Second, we will facilitate and incentivize contributions of data by third- party data contributors.
This Aim will be achieved by extending our python programming library for reading from and writing to Wikidata, and by creating automated reports that notify resource providers when new relevant content is added or edited. Third, we will also seek to encourage contributions from domain experts using targeted incentives. Specifically, this aim will develop interfaces to Wikidata that provide integrated data reports that are otherwise unavailable, as well as extend the Gene Wiki Reviews series of invited reviews, which rewards contributions with traditional metrics of academic achievement. Finally, underlying these three Specific Aims will be a Driving Biological Project focusing in infectious disease research, which will ensure the tools and resources developed will have practical benefit to discovery-oriented research projects.

Public Health Relevance

This proposal addresses the challenge of making all biomedical knowledge Findable, Accessible, Interoperable, and Reusable (FAIR). This work builds on Wikidata, a community-maintained knowledge base that can be read and edited by both humans and computers.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Scripps Research Institute
La Jolla
United States
Zip Code
Pecci, Alessandro; Ma, Xuefei; Savoia, Anna et al. (2018) MYH9: Structure, functions and role of non-muscle myosin IIA in human disease. Gene 664:152-167
Janes, Jeff; Young, Megan E; Chen, Emily et al. (2018) The ReFRAME library as a comprehensive drug repurposing library and its application to the treatment of cryptosporidiosis. Proc Natl Acad Sci U S A 115:10750-10755
Daniel, Dianne C; Johnson, Edward M (2018) PURA, the gene encoding Pur-alpha, member of an ancient nucleic acid-binding protein family with mammalian neurological functions. Gene 643:133-143
Schmidt, Laura S; Linehan, W Marston (2018) FLCN: The causative gene for Birt-Hogg-Dubé syndrome. Gene 640:28-42
Wang, Jie; Lee, Jessica; Liem, David et al. (2017) HSPA5 Gene encoding Hsp70 chaperone BiP in the endoplasmic reticulum. Gene 618:14-23
Froimchuk, Eugene; Jang, Younghoon; Ge, Kai (2017) Histone H3 lysine 4 methyltransferase KMT2D. Gene 627:337-342
Lin, Dasheng; Alberton, Paolo; Caceres, Manuel Delgado et al. (2017) Tenomodulin is essential for prevention of adipocyte accumulation and fibrovascular scar formation during early tendon healing. Cell Death Dis 8:e3116
Chen, Kong; Kolls, Jay K (2017) Interluekin-17A (IL17A). Gene 614:8-14
Ghaleb, Amr M; Yang, Vincent W (2017) Krüppel-like factor 4 (KLF4): What we currently know. Gene 611:27-37
Griffith, Malachi; Spies, Nicholas C; Krysiak, Kilannin et al. (2017) CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet 49:170-174

Showing the most recent 10 out of 87 publications