Predictive modeling of biomedical data arising from clinical studies for early detection, monitoring and prognosis of diseases is a crucial step in biomarker discovery. Since the data are typically measurements subject to error, and the sample size of any study is very small compared to the number of variables measured, the validity and verification of models arising from such datasets significantly impacts the discovery of reliable discriminatory markers for a disease. An important opportunity to make the most of these scarce data is to combine information from multiple related data sets for more effective biomarker discovery. Because the costs of creating large data sets for every disease of interest are likely to remain prohibitive, methods for more effectively making use of related biomarker discovery data sets continues to be important. Solution: This project develops and applies Transfer Rule Learning (TRL), a novel framework for integrative biomarker discovery from related but separate data sets, such as those generated from similar biomarker profiling studies. TRL alleviates the problem of data scarcity by providing automated ways to express, verify and use prior hypotheses generated from one data set while learning new knowledge via a related data set. This is the first study of transfer learning for biomarker discovery. Unlike other transfr learning approaches, TRL takes knowledge in the form of interpretable, modular classification rules, and uses them to seed learning of a rule model on a new data set. Classification rules simplify the extraction of discriminatory markers, and have been used successfully for biomarker discovery and verification in a non-integrative fashion.
Specific Aims : This project tests the main hypothesis that TRL provides a mechanism for transfer learning of classification rules between related source and target data sets that improve performance on the target data, compared to learning without transfer. TRL will be evaluated using cross-validation performance of classification accuracy and transfer measures, on related groups of existing biomarker discovery datasets obtained from multiple experimental platforms for lung cancer detection and prognosis. A new set of independent validation data will be generated for early detection of lung cancer to test the models generated on pilot data. Insights into the impact of different modeling algorithms on transfer outcomes will be gleaned. Significance: The TRL framework and tool are important for combined analysis and interpretation of clinical data, as they support incremental building, verification and refinement of rule models for predictive biomedicine. The application of TRL to real-world biomarker discovery datasets can yield insights into novel interactions involving known markers, and the most reliable biomarkers for early detection of disease, particularly lung cancer. This project has the potential to help create new diagnostic screening tools for lung cancer detection. It allows foundational understanding of the use of transfer learning for integrative biomarker discovery that could lead to novel technologies for combining information from data and prior knowledge.

Public Health Relevance

This project will develop highly-needed computational methods for integrative biomarker discovery from related but separate data sets produced by predictive molecular profiling studies of disease. It will generate new experimental data for early detection of lung cancer, and has the potential to help create new diagnostic screening tools for lung cancer, a leading cause of death from cancer in the United States.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Huang, Tianzhi; Alvarez, Angel A; Pangeni, Rajendra P et al. (2016) A regulatory circuit of miR-125b/miR-20b and Wnt signalling controls glioblastoma phenotypes through FZD6-modulated pathways. Nat Commun 7:12885
Pineda, Arturo López; Ogoe, Henry Ato; Balasubramanian, Jeya Balaji et al. (2016) On Predicting lung cancer subtypes using 'omic' data from tumor and tumor-adjacent histologically-normal tissue. BMC Cancer 16:184
Ogoe, Henry A; Visweswaran, Shyam; Lu, Xinghua et al. (2015) Knowledge transfer via classification rules using functional mapping for integrative modeling of gene expression data. BMC Bioinformatics 16:226
Gopalakrishnan, Vanathi; Menon, Prahlad G; Madan, Shobhit (2015) cMRI-BED: A novel informatics framework for cardiac MRI biomarker extraction and discovery applied to pediatric cardiomyopathy classification. Biomed Eng Online 14 Suppl 2:S7
Pineda, Arturo Lopez; Gopalakrishnan, Vanathi (2015) Novel Application of Junction Trees to the Interpretation of Epigenetic Differences among Lung Cancer Subtypes. AMIA Jt Summits Transl Sci Proc 2015:31-5
Avali, Viji R; Cooper, Gregory F; Gopalakrishnan, Vanathi (2014) Application of Bayesian logistic regression to mining biomedical data. AMIA Annu Symp Proc 2014:266-73
Jordan, Rick; Visweswaran, Shyam; Gopalakrishnan, Vanathi (2014) Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids. J Clin Bioinforma 4:13
Balasubramanian, Jeya B; Visweswaran, Shyam; Cooper, Gregory F et al. (2014) Selective model averaging with bayesian rule learning for predictive biomedicine. AMIA Jt Summits Transl Sci Proc 2014:17-22
Menon, Prahlad G; Morris, Lailonny; Staines, Mara et al. (2014) Novel MRI-derived quantitative biomarker for cardiac function applied to classifying ischemic cardiomyopathy within a Bayesian rule learning framework. Proc SPIE Int Soc Opt Eng 9034:
Dutta-Moscato, Joyeeta; Gopalakrishnan, Vanathi; Lotze, Michael T et al. (2014) Creating a pipeline of talent for informatics: STEM initiative for high school students in computer science, biology, and biomedical informatics. J Pathol Inform 5:12

Showing the most recent 10 out of 11 publications