Reaching Data Confidentiality and Model Accountability on the CalTrain

12/07/2018
by   Zhongshu Gu, et al.
0

Distributed collaborative learning (DCL) paradigms enable building joint machine learning models from distrusting multi-party participants. Data confidentiality is guaranteed by retaining private training data on each participant's local infrastructure. However, this approach to achieving data confidentiality makes today's DCL designs fundamentally vulnerable to data poisoning and backdoor attacks. It also limits DCL's model accountability, which is key to backtracking the responsible "bad" training data instances/contributors. In this paper, we introduce CALTRAIN, a Trusted Execution Environment (TEE) based centralized multi-party collaborative learning system that simultaneously achieves data confidentiality and model accountability. CALTRAIN enforces isolated computation on centrally aggregated training data to guarantee data confidentiality. To support building accountable learning models, we securely maintain the links between training instances and their corresponding contributors. Our evaluation shows that the models generated from CALTRAIN can achieve the same prediction accuracy when compared to the models trained in non-protected environments. We also demonstrate that when malicious training participants tend to implant backdoors during model training, CALTRAIN can accurately and precisely discover the poisoned and mislabeled training data that lead to the runtime mispredictions.

READ FULL TEXT

page 1

page 9

research
05/10/2018

Inference Attacks Against Collaborative Learning

Collaborative machine learning and related techniques such as distribute...
research
10/21/2020

Amnesiac Machine Learning

The Right to be Forgotten is part of the recently enacted General Data P...
research
02/09/2021

CaPC Learning: Confidential and Private Collaborative Learning

Machine learning benefits from large training datasets, which may not al...
research
05/08/2019

Robust Federated Training via Collaborative Machine Teaching using Trusted Instances

Federated learning performs distributed model training using local data ...
research
01/08/2019

Contamination Attacks and Mitigation in Multi-Party Machine Learning

Machine learning is data hungry; the more data a model has access to in ...
research
07/21/2020

Fair and autonomous sharing of federate learning models in mobile Internet of Things

Federate learning can conduct machine learning as well as protect the pr...
research
10/29/2021

ADDS: Adaptive Differentiable Sampling for Robust Multi-Party Learning

Distributed multi-party learning provides an effective approach for trai...

Please sign up or login with your details

Forgot password? Click here to reset