Capitalizing on the transformative opportunities afforded by the extremely large and ever-growing volume, velocity, and variety of biomedical data being continuously produced is a major challenge. The development and increasingly widespread adoption of several new technologies, including next generation genetic sequencing, electronic health records and clinical trials systems, and research data warehouses means that we are in the midst of a veritable explosion in data production. This in turn results in the migration of the bottleneck in scientific productivity into data management and interpretation: tools are urgently needed to assist cancer researchers in the assembly, integration, transformation, and analysis of these Big Data sets. In this project, we propose to develop the Semantic Data Lake for Biomedical Research (SDL-BR) system, a cluster-computing software environment that enables rapid data ingestion, multifaceted data modeling, logical and semantic querying and data transformation, and intelligent resource discovery. SDL-BR is based on the idea of a data lake, a distributed store that does not make any assumptions about the structure of incoming data, and that delays modeling decisions until data is to be used. This project adds to the data lake paradigm methods for semantic data modeling, integration, and querying, and for resource discovery based on learned relationships between users and data resources.

Public Health Relevance

The SDL-BR System is a distributed computing software solution that enables research institutions to manage, integrate, and make available large institutional data sets to researchers, and that permits users to generate data models specific to particular applications. It uses state of the art cluster computing, Semantic Web, and machine learning technologies to provide for rapid data ingestion, semantic modeling and querying, and search and discovery of data resources through a sophisticated, Web-based user interface.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
4R44CA206782-02
Application #
9536289
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Evans, Gregory
Project Start
2016-09-15
Project End
2020-02-29
Budget Start
2017-09-01
Budget End
2018-08-31
Support Year
2
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Infotech Soft, Inc.
Department
Type
DUNS #
035354070
City
Miami
State
FL
Country
United States
Zip Code
33131