Relational machine learning methods can significantly improve the predictive accuracy of models for a range of network domains, from social networks to physical and biological networks. The methods automatically learn network correlation patterns (e.g., in biological networks a pair of interacting proteins are more likely to have the same function than two randomly selected proteins) from observed data and then use them in a collective inference process to propagate predictions throughout the network. The primary assumption in these relational methods is that model parameters estimated from one network are applicable to other networks drawn from the same distribution. However, there has been little work studying the impact of this assumption, and in particular how variability in network structure affects the performance of relational models and collective inference. This project aims to investigate this issue in order to move beyond the implicit assumption that the networks/data are drawn from the same underlying distribution. The research will establish a formal framework for learning across heterogeneous network structures and characterize the impact of network structure on models of attribute correlation. The findings will deepen our understanding of how/when relational model performance generalizes across network datasets and the work will develop new methods to improve generalization.

More specifically, in this project the PI makes the key observation that templated graphical models are often used in network classification methods. These models are composed of small (i.e., local) model templates that are ''rolled out'' over a heterogeneous network to dynamically construct a larger model with variable structure for estimation and inference. Due to the roll out process, the generalizability of a learned model will depend on the similarity between the networks used for learning and prediction. In this project, the PI will study this issue in greater depth by formalizing relational learning and collective inference as a ''transfer learning'' problem, with the goal of learning a model from one domain and successfully applying it to a different domain. The research will investigate how to best transfer learned knowledge within networks (i.e., from one labeled part of a network to another), and across networks (i.e., from one network in a population to another). The project will develop rigorous statistical methods and advanced computational algorithms to answer this question via four specific aims: (Aim1) formal foundation for assessing transferability within and across networks; (Aim2) generative models of attributed networks for empirical investigation; (Aim3), within-network transfer methods for non-stationary data; and (Aim4) across-network transfer methods using template matching and global smoothing.

Project Start
Project End
Budget Start
2016-07-01
Budget End
2021-06-30
Support Year
Fiscal Year
2016
Total Cost
$495,308
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907