Biomedical and healthcare data sharing efforts are currently impaired by lack of (1) proper incentives and sharing tools for data producers, (2) practical frameworks for data standardization and indexing of data, and (3) effective data discovery mechanisms. BioCADDIE is a consortium of data producers, curators, publishers, and consumers who will work together to develop practical, sustainable solutions to the problem of biomedical and healthcare data discovery. Through task forces and corresponding pilot projects addressing the barriers enumerated above, we will promote open discussion of why millions of dollars are currently spent in the generation of data that remain captive at their origin or are shared in a sub-optimal way just to comply with mandates from funding agencies and scientific journals. We will promote the development of incentives, policies, and tools for data sharing and data discovery. We will engage researchers, clinicians, patients, and the community in general in an open dialogue focused on pros and cons of biomedical and clinical data sharing. BioCADDIE's specific aims are to: (1) Organize task forces with representatives from communities who have interest in data production, dissemination, and utilization. We will organize an annual symposium, workshops, Internet-based discussions among biomedical and clinical researchers, professional societies, journal publishers, funding agencies, clinicians, patients, and information scientists on best, sustainable practices for making data easily discoverable by different types of users. (2) Promote the development of realistic, minimal, friendly meta-data specifications and annotations for biomedical and healthcare data collections, and corresponding tools for automated indexing so that users will be able to locate data that are relevant to their specific free text searches. (3) Incubate new technologies by funding highly innovative, high-risk pilot research projects that enable the development of novel data discovery and indexing engines and have them tested by our diverse community of stakeholders. We only describe a small number of seed pilot projects in this proposal because BioCADDIE will solicit proposals for new pilot projects every year and select them through a review process involving the various stakeholder communities.

Public Health Relevance

Biomedical research and healthcare data are not fully utilized in part due to lack of incentives and tools to share these data in a way that makes it possible to reproduce results and make new discoveries. We will develop a consortium involving data producers, data disseminators, and data consumers (including patients) to develop tools and processes for easy discovery and access to data.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Lin, Dawei
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
Internal Medicine/Medicine
Schools of Medicine
La Jolla
United States
Zip Code
Wimalaratne, Sarala M; Juty, Nick; Kunze, John et al. (2018) Uniform resolution of compact identifiers for biomedical data. Sci Data 5:180029
Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak et al. (2018) DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc :
Dixit, Ram; Rogith, Deevakar; Narayana, Vidya et al. (2017) User needs analysis and usability assessment of DataMed - a biomedical data discovery index. J Am Med Inform Assoc :
Scerri, Antony; Kuriakose, John; Deshmane, Amit Ajit et al. (2017) Elsevier's approach to the bioCADDIE 2016 Dataset Retrieval Challenge. Database (Oxford) 2017:
Cohen, Trevor; Roberts, Kirk; Gururaj, Anupama E et al. (2017) A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge. Database (Oxford) 2017:
Zong, Nansu; Lee, Sungin; Ahn, Jinhyun et al. (2017) Supporting inter-topic entity search for biomedical Linked Data based on heterogeneous relationships. Comput Biol Med 87:217-229
Ohno-Machado, Lucila; Sansone, Susanna-Assunta; Alter, George et al. (2017) Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49:816-819
Sitapati, Amy; Kim, Hyeoneui; Berkovich, Barbara et al. (2017) Integrated precision medicine: the role of electronic health records in delivering personalized treatment. Wiley Interdiscip Rev Syst Biol Med 9:
Wright, Theodore B; Ball, David; Hersh, William (2017) Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge. Database (Oxford) 2017:
Perez-Riverol, Yasset; Bai, Mingze; da Veiga Leprevost, Felipe et al. (2017) Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol 35:406-409

Showing the most recent 10 out of 14 publications