Artificial intelligence on genomic/healthcare data that is performed jointly between multiple collaborating institutions relies on a trust model but can accelerate genomic medicine research and facilitate quality improvement. To conduct such machine learning while protecting patient privacy and reducing security risks, we are developing blockchain-based privacy-preserving learning methods in a K99/R00 study supported by the National Human Genome Research Institute (NHGRI). However, our previous design of privacy-preserving learning on private blockchain assumed ?semi-honesty? as the underlying adversary assumption. That is, we assume that each participating site is curious yet very careful and honest, such that it would only submit correct predictive models. Nevertheless, in real world this assumption may be too optimistic; the models submitted could be an old one due to network latency or malicious users may try to create fake models, which can in turn lead to bioethical concerns and reduce the incentives for genomic/clinical institutions to participate in the collaborative predictive modeling. Therefore, the capability to detect, assess and prevent ?model misconducts? is critical to increase the integrity/reliability of machine learning. To address this issue, we consider the following 3 types of model misconducts: (1) model plagiarism, of which a site becomes a free-rider and just submits a copy of a model from the other sites, trying to hide their own information and inspect models from other sites; (2) model fabrication, of which a site mocks up a model, trying to hide information and disturb the machine learning process; and (3) model falsification, of which a site tweaks its model a bit, trying to just disturb the learning process. For each type of the model misconducts, we are interest in how to detect these misconducts of another site, how to assess the losses of machine learning results due to misconducts, and how to prevent these model misconducts.
Our aims i nclude (a) detecting model misconducts using model properties, (b) assessing model misconducts losses via model simulation, and (c) preventing model misconducts based on whole model history. The innovative components to our proposed project include (i) summarizing various types of model misconduct, (ii) developing a complete strategy to handle the model misconduct, and (iii) providing a generalizable approach to mitigate bioethical concerns for collaborative machine learning.
Artificial intelligence performed jointly between multiple collaborating institutions can accelerate genomic medicine research and facilitate quality improvement, but relies on a trust model which may be too optimistic in real-world setting. In this project, we plan to develop a comprehensive detection, assessment and prevention mechanism to address the potential bioethical risks brought by misconducts of model plagiarism, fabrication, and falsification. The proposed study can supplement the considerations of model misconducts for our original project of privacy-preserving learning on blockchain.