The overall goal of this project is to promote the accessibility and dissemination of biomedical information so that the research community can better leverage existing knowledge. Science is most efficient when hypotheses are based on the entirety of knowledge available to date. Unfortunately, up-to-date and comprehensive access to relevant knowledge is rarely achieved. This proposals put a particular emphasis on illuminating biomedical ?dark data.? By analogy to the dark matter that is unaccounted for in the universe, dark data is defined by being unseen or underutilized by the scientific community. In this project, we will continuously strengthen our currently widely- used applications BioGPS and, and also develop two new applications: BioThings and BioReel. These applications, collectively, are targeted to make dark data resources Findable, Accessible, Interoperable, and Reusable (FAIR). BioGPS and BioReel are designed for non-computational scientists. BioGPS ( is a gene portal for aggregating information on human genes and proteins. It illuminates dark data by creating a simple platform to discover and access gene-centric websites. BioGPS users can benefit each other by sharing the specific resources they discovered, and how they use or like them. BioReel will be developed as a tool to periodically monitor the relevant resources for researchers, and keep them notified when the knowledge about their genes of interest have been updated (e.g. new datasets available, annotated in a new pathway). and BioThings are designed for bioinformatics developers, who often face fragmented source data in terms of both the content and the heterogeneous formats. The significant amount of repetitive data-wrangling efforts has to be done by almost every bioinformaticians. We developed to integrate gene and protein annotation data into a simple and high performance web Application Programming Interface (API). It illuminates dark data on gene and protein annotations by pre-integrating over 200 annotation types in a standardized format. In this proposal, we will continue expand to include additional highly- requested annotations, both from a major data repository and smaller domain-specific data sources. In addition, we will generalize the infrastructure and the software pattern underlying the project, to make a generic API framework called the ?BioThings SDK?. Two new APIs will be built using this framework, focusing on drugs/chemicals and diseases respectively, where the data fragmentation across resources are equally a problem.

Public Health Relevance

BioGPS is a gene annotation portal that is widely used in the biomedical research community. This web application provides researchers integrated access to biomedical knowledge resources. This proposal will extend our support from genes to drugs and diseases, and also build a new application called BioReel to enable researchers to stay up-to-date on the latest knowledge relevant to their studies.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Scripps Research Institute
La Jolla
United States
Zip Code
Wu, Chunlei; Jin, Xuefeng; Tsueng, Ginger et al. (2016) BioGPS: building your own mash-up of gene annotations and expression profiles. Nucleic Acids Res 44:D313-6
Nicolas, Emmanuelle; Golemis, Erica A; Arora, Sanjeevani (2016) POLD1: Central mediator of DNA replication and repair, and implication in cancer and other pathologies. Gene 590:128-41
Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Mitraka, Elvira et al. (2016) Wikidata as a semantic framework for the Gene Wiki initiative. Database (Oxford) 2016:
Khare, Ritu; Good, Benjamin M; Leaman, Robert et al. (2016) Crowdsourcing in biomedicine: challenges and opportunities. Brief Bioinform 17:23-32
Putman, Tim E; Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra et al. (2016) Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes. Database (Oxford) 2016:
Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus et al. (2016) High-performance web services for querying gene and variant annotation. Genome Biol 17:91
Song, Wei; Wang, Hao; Wu, Qingyu (2015) Atrial natriuretic peptide in cardiovascular biology and disease (NPPA). Gene 569:1-6
Zuehlke, Abbey D; Beebe, Kristin; Neckers, Len et al. (2015) Regulation and function of the human HSP90AA1 gene. Gene 570:8-16
Deneka, Alexander; Korobeynikov, Vladislav; Golemis, Erica A (2015) Embryonal Fyn-associated substrate (EFS) and CASS4: The lesser-known CAS protein family members. Gene 570:25-35
Dörfel, Max J; Lyon, Gholson J (2015) The biological functions of Naa10 - From amino-terminal acetylation to human disease. Gene 567:103-31

Showing the most recent 10 out of 26 publications