An Intelligent Concept Agent for Assisting with the Application of Metadata

Mungall, Christopher

Abstract

Biomedical investigators are generating increasing amounts of complex and diverse data. This data varies tremendously, from genome sequences through phenotypic measurements and imaging data. If researchers and data scientists can tap into this data effectively, then we can gain insights into disease mechanisms and how to tackle them. However, the main stumbling block is that it is increasingly hard to find and integrate the relevant datasets due to the lack of sufficient metadata. A researcher studying Crohn's disease may miss a crucial dataset on how certain microbial communities affect gut histology due to the lack of descriptive tags on the data. Currently, applying metadata is difficult, time-consuming and error prone due to the vast sea of confusing and overlapping standards for each datatype. Often specialized `data wranglers' are employed to apply metadata, but even these experts are hindered by lack of good tools. Here we propose to develop an intelligent agent that researchers and data wranglers can use to assist them apply metadata. The agent is based around a personalized dashboard of metadata elements that can be collected from multiple specialized portals, as well as sites such as Wikipedia. These elements can be coupled with classifiers that can be used to self-identify datasets to which they may be relevant, making the selection of appropriate vocabularies easier for researchers. We will deploy the system for a number of targeted use cases, including annotation of the National Center for Biomedical Information Bio-Samples repository, and annotation of images within the Figshare repository.

Public Health Relevance

Biomedical data is being generated at an increasing rate, and it is becoming increasingly difficult for researchers to be able to locate and effectively operate over this data, which has negative impacts on the rate of new discoveries. One solution is to attach metadata (data about data) onto all information generated in a research project, but application of metadata is currently difficult and time consuming due to the diverse range of standards on offer, typically requiring the expertise of trained data wranglers. Here we propose to develop an intelligent concept assistant that will allow researchers to generate and share sets of metadata elements relevant to their project, and will use machine learning techniques to automatically apply this to data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 5U01HG009453-03
Application #: 9545836
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Sofia, Heidi J

Project Start: 2016-09-23
Project End: 2019-08-31
Budget Start: 2018-09-01
Budget End: 2019-08-31
Support Year: 3
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Lawrence Berkeley National Laboratory
Department
Type
DUNS #: 078576738

City: Berkeley
State: CA
Country: United States
Zip Code: 94720

Related projects


NIH 2018 U01 HG	An Intelligent Concept Agent for Assisting with the Application of Metadata Mungall, Christopher J. / Lawrence Berkeley National Laboratory
NIH 2017 U01 HG	An Intelligent Concept Agent for Assisting with the Application of Metadata Mungall, Christopher J. / Lawrence Berkeley National Laboratory
NIH 2016 U01 HG	An Intelligent Concept Agent for Assisting with the Application of Metadata Mungall, Christopher J. / Lawrence Berkeley National Laboratory	$575,532

Publications

Matentzoglu, Nicolas; Malone, James; Mungall, Chris et al. (2018) MIRO: guidelines for minimum information for the reporting of an ontology. J Biomed Semantics 9:6

Lizio, Marina; Harshbarger, Jayson; Abugessaisa, Imad et al. (2017) Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals. Nucleic Acids Res 45:D737-D743

Osumi-Sutherland, David; Courtot, Melanie; Balhoff, James P et al. (2017) Dead simple OWL design patterns. J Biomed Semantics 8:18

Köhler, Sebastian; Vasilevsky, Nicole A; Engelstad, Mark et al. (2017) The Human Phenotype Ontology in 2017. Nucleic Acids Res 45:D865-D876

Links, Amanda E; Draper, David; Lee, Elizabeth et al. (2016) Distributed Cognition and Process Management Enabling Individualized Translational Research: The NIH Undiagnosed Diseases Program Experience. Front Med (Lausanne) 3:39

Comments

Be the first to comment on Christopher Mungall's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: