Reconstruction of networks involving direct and indirect interactions between genes and/or gene products provides a deep understanding of biological mechanisms in health and disease. However, this task poses a difficult challenge due to the noisy nature of experimental data and the large size of the underlying network. We propose a method that can resolve both challenges in a systematic way using the human gene interaction network as the target network. We model ?interaction? as a stochastic process and use the high number of external databases available as a knowledge source to assign a confidence value on the interaction between two molecules. This information is used by our Bayesian-network-based recovery approach that fuses experimental data with existing external knowledge to learn interaction networks. In the proposed method, the large interaction atlas is learned using a modular approach motivated by the seemingly compartmentalized yet connected workings of biological pathways. First, macromolecules are clustered using a mutual-information-based distance metric applied to the measured experimental values. This way, nodes that exhibit high dependency in their signal distributions are grouped together. Then, the interaction network for each cluster, or module, is learned using the Bayesian-network-based approach incorporating both experimental data and existing external knowledge. Each module is represented by one node, which is used by a second network learning phase resulting in the interaction map between the modules. Finally, the linked modules are considered together to identify the interactions between the nodes they contain. The ensemble of the interactions between the unified modules represents the final interactome atlas. We hypothesize that the proposed workflow will efficiently and effectively calculate the large interaction atlas in humans, drastically decreasing the computational load. We will test the proposed algorithm on synthetic, simulated, and real data sets to assess its performance. We will apply our methodology to pancreatic cancer and other gastrointestinal cancers to identify network characteristics, motifs, and hub genes that are specific to pancreatic cancer. This comparative analysis will be used to interpret the underlying biological mechanisms and suggest potential therapeutic targets. To this end, we will use network parameter estimation and random matrix theory and expect to discover novel interactions that have not yet been established in the current databases. To promote ?reproducibility of research findings through increased scientific rigor and transparency,? we will provide all of the test data sets as well as the source code, executable software, and input data files on our web portal.

Public Health Relevance

Learning about networks involving direct and indirect interactions between genes and/or gene products enables scientists to understand and test hypotheses involving the molecular mechanisms of an organism. In this project, we plan to establish a methodology that will reconstruct the interactome atlas of human in health and disease. Recovery of large interaction networks enhances our understanding of biological mechanisms at a level that is above and beyond individual pathway domains and improve our knowledge as to the causes, prevention, and cure of human disease.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21LM012759-02
Application #
9731692
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2018-07-01
Project End
2021-06-30
Budget Start
2019-07-01
Budget End
2021-06-30
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Nebraska Lincoln
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
555456995
City
Lincoln
State
NE
Country
United States
Zip Code
68503