Predictive modeling of biomedical data arising from clinical studies for early detection, monitoring and prognosis of diseases is a crucial step in biomarker discovery. Since the data are typically measurements subject to error, and the sample size of any study is very small compared to the number of variables measured, the validity and verification of models arising from such datasets significantly impacts the discovery of reliable discriminatory markers for a disease. An important opportunity to make the most of these scarce data is to combine information from multiple related data sets for more effective biomarker discovery. Because the costs of creating large data sets for every disease of interest are likely to remain prohibitive, methods for more effectively making use of related biomarker discovery data sets continues to be important. Solution: This project develops and applies Transfer Rule Learning (TRL), a novel framework for integrative biomarker discovery from related but separate data sets, such as those generated from similar biomarker profiling studies. TRL alleviates the problem of data scarcity by providing automated ways to express, verify and use prior hypotheses generated from one data set while learning new knowledge via a related data set. This is the first study of transfer learning for biomarker discovery. Unlike other transfr learning approaches, TRL takes knowledge in the form of interpretable, modular classification rules, and uses them to seed learning of a rule model on a new data set. Classification rules simplify the extraction of discriminatory markers, and have been used successfully for biomarker discovery and verification in a non-integrative fashion.
Specific Aims : This project tests the main hypothesis that TRL provides a mechanism for transfer learning of classification rules between related source and target data sets that improve performance on the target data, compared to learning without transfer. TRL will be evaluated using cross-validation performance of classification accuracy and transfer measures, on related groups of existing biomarker discovery datasets obtained from multiple experimental platforms for lung cancer detection and prognosis. A new set of independent validation data will be generated for early detection of lung cancer to test the models generated on pilot data. Insights into the impact of different modeling algorithms on transfer outcomes will be gleaned. Significance: The TRL framework and tool are important for combined analysis and interpretation of clinical data, as they support incremental building, verification and refinement of rule models for predictive biomedicine. The application of TRL to real-world biomarker discovery datasets can yield insights into novel interactions involving known markers, and the most reliable biomarkers for early detection of disease, particularly lung cancer. This project has the potential to help create new diagnostic screening tools for lung cancer detection. It allows foundational understanding of the use of transfer learning for integrative biomarker discovery that could lead to novel technologies for combining information from data and prior knowledge.

Public Health Relevance

This project will develop highly-needed computational methods for integrative biomarker discovery from related but separate data sets produced by predictive molecular profiling studies of disease. It will generate new experimental data for early detection of lung cancer, and has the potential to help create new diagnostic screening tools for lung cancer, a leading cause of death from cancer in the United States.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM100387-01A1
Application #
8373065
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Lyster, Peter
Project Start
2012-09-24
Project End
2015-07-31
Budget Start
2012-09-24
Budget End
2013-07-31
Support Year
1
Fiscal Year
2012
Total Cost
$299,716
Indirect Cost
$99,716
Name
University of Pittsburgh
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
004514360
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Balasubramanian, Jeya Balaji; Gopalakrishnan, Vanathi (2018) Tunable structure priors for Bayesian rule learning for knowledge integrated biomarker discovery. World J Clin Oncol 9:98-109
Lustgarten, Jonathan Lyle; Balasubramanian, Jeya Balaji; Visweswaran, Shyam et al. (2017) Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure. Data (Basel) 2:
Liu, Yuzhe; Gopalakrishnan, Vanathi (2017) An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data. Data (Basel) 2:
Pineda, Arturo López; Ogoe, Henry Ato; Balasubramanian, Jeya Balaji et al. (2016) On Predicting lung cancer subtypes using 'omic' data from tumor and tumor-adjacent histologically-normal tissue. BMC Cancer 16:184
Huang, Tianzhi; Alvarez, Angel A; Pangeni, Rajendra P et al. (2016) A regulatory circuit of miR-125b/miR-20b and Wnt signalling controls glioblastoma phenotypes through FZD6-modulated pathways. Nat Commun 7:12885
Torbati, Mahbaneh Eshaghzadeh; Mitreva, Makedonka; Gopalakrishnan, Vanathi (2016) Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations. Data (Basel) 1:
Gopalakrishnan, Vanathi; Menon, Prahlad G; Madan, Shobhit (2015) cMRI-BED: A novel informatics framework for cardiac MRI biomarker extraction and discovery applied to pediatric cardiomyopathy classification. Biomed Eng Online 14 Suppl 2:S7
Pineda, Arturo Lopez; Gopalakrishnan, Vanathi (2015) Novel Application of Junction Trees to the Interpretation of Epigenetic Differences among Lung Cancer Subtypes. AMIA Jt Summits Transl Sci Proc 2015:31-5
Ogoe, Henry A; Visweswaran, Shyam; Lu, Xinghua et al. (2015) Knowledge transfer via classification rules using functional mapping for integrative modeling of gene expression data. BMC Bioinformatics 16:226
Menon, Prahlad G; Morris, Lailonny; Staines, Mara et al. (2014) Novel MRI-derived quantitative biomarker for cardiac function applied to classifying ischemic cardiomyopathy within a Bayesian rule learning framework. Proc SPIE Int Soc Opt Eng 9034:

Showing the most recent 10 out of 15 publications