CAREER: Computational strategies for incompleteness and heterogeneity in multi-omic data

Yan, Jingwen

Abstract

Multi-omics refers to the integrative analysis of multiple types of -omics data (e.g., genotype, gene expression and protein expression). Increasing multi-omic data provides opportunities for discovery of disease biomarkers from multiple molecular scales and therefore can further our understanding of underlying disease mechanisms. Despite this great potential, existing multi-omic data collections are mostly incomplete and of heterogeneous types (e.g., continuous and categorical numbers). Integrating these data for joint analysis typically requires exclusion of many subjects with missing values; as a consequence, a large chunk of data remains unused. This project provides novel perspectives in handling the incompleteness and heterogeneity problems in multi-omics data and hereafter allow biomedical researchers to gain more insights from rapidly growing yet imperfect biomedical data. In addition, the increasing multi-omics data has led to a massive transformation in biomedical research and has resulted in an unprecedented need in information management, decision support, and advanced analytics. In this project, a series of educational activities will be conducted to engage students at their early stages of education and to increase their awareness of educational opportunities and career paths in biomedical informatics.

This project aims to develop new classes of computational methods to enable the joint mining of incomplete and heterogeneous multi-omic data by leveraging various biological networks for discovery of functionally connected biomarkers. Towards this, two tasks will be performed: 1) identify multi-omic subnetworks as biomarkers via a multi-task joint network module detection and feature selection model, and 2) select associated features between heterogeneous -omics layers via a novel multi-task sparse association model. The first task aims to address the incomplete data problem. This new model can not only handle the incomplete data collected from one large-scale project, but also allow the joint analysis of -omics data from multiple small-scale projects without overlap in subjects. The second task addresses the heterogeneity problem with a novel two-step strategy in associating different -omics layers. Built upon these research efforts, three outreach educational activities will be conducted: 1) develop a project-based curriculum for high school students, 2) host an annual summer workshop on multi-omics for high school students, and 3) provide advanced research opportunities to undergraduates from biomedical informatics and related disciplines. This research effort will lead to discovery of more reliable biomarkers for further validation and better understanding of their relationships with disease traits than currently possible.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1942394
Program Officer: Amarda Shehu

Project Start
Project End
Budget Start: 2020-06-01
Budget End: 2025-05-31
Support Year
Fiscal Year: 2019
Total Cost: $107,089
Indirect Cost

CAREER: Computational strategies for incompleteness and heterogeneity in multi-omic data
Yan, Jingwen
Indiana University, Bloomington, IN, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments