Scalable Bayesian Network analysis of multimodal FACS and SUMOylation data, with generalization to other big mixed biological datasets Abstract The Bayesian, or Belief, Network (BN) modeling is a powerful tool that is currently emerging as one of the principal data analysis, exploration and visualization methods for multimodal (aka mixed, or heterogeneous) ?big? biological data. We have previously developed comprehensive BN algorithms and software package aimed at heterogeneous big biological data analysis. Over the recent years we have applied it to the different biological research domains / datasets (including chromatin interaction, tRNA evolution, genetic epidemiology and metabolomics, cancer epidemiology and single cell thymopoiesis data); work on three more projects (inferring immune signaling networks using FACS data, genome-wide SUMOylation, Alzheimer's genomic analysis) is currently in progress. In course of this work we have identified crucial ?bottlenecks? that need to be addressed, on the methodological level, to make the BN analysis universally usable in our general context (that is, big biological data containing large numbers of variables of different types). These issues (scalability of the BN reconstruction process, handling mixed data types, and interpretation, evaluation & comparison of the resulting network models) have not been adequately addressed in the field yet, thus limiting the usability of the otherwise very powerful and elegant BN approach. Consequently, the primary goal of this project is to develop novel BN analysis algorithms with emphasis on (a) scalability, (b) handling mixed data types, and (c) resulting networks' interpretation and evaluation. We are particularly interested in the BN analysis of the quantitative flow cytometry (FACS) data generated as part of the ongoing City of Hope cancer immunogenetics research projects, as this type of data exemplifies BN modeling challenges, and any advances in algorithm and software development would be generalizable to most instances of big biological data. We will subsequently apply the BN analysis to the SUMOylation and chromatin interaction genomic data (also generated as part of the ongoing collaborative City of Hope research projects), to further test generalizability, and to produce additional biological results.

Public Health Relevance

Scalable Bayesian Network analysis of multimodal FACS and SUMOylation data, with generalization to other big mixed biological datasets Public Health Relevance / Narrative The primary goal of this project is development of the Bayesian Network (BN) systems biology data analysis framework, with the emphasis on scalability, mixed data handling, and resulting networks' interpretation and comparison. We are especially interested in BN analysis of the large-scale flow cytometry (FACS) cancer immunogenetics datasets and SUMOylation datasets generated as part of the ongoing City of Hope research projects, as these types of data exemplify BN modeling challenges, and advances in algorithm and software development would be easily generalizable to many instances of heterogeneous big biological data. The secondary goal of this project is the actual cancer immunogenetics FACS data analysis and SUMOylation data analysis, with many biological deliverables.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM013138-01A1
Application #
10048110
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2020-07-01
Project End
2023-03-31
Budget Start
2020-07-01
Budget End
2021-03-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Beckman Research Institute/City of Hope
Department
Type
DUNS #
027176833
City
Duarte
State
CA
Country
United States
Zip Code
91010