Packing Privacy Budget Efficiently

by   Pierre Tholoniat, et al.

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.


page 1

page 2

page 3

page 4


Privacy Budget Scheduling

Machine learning (ML) models trained on personal data have been shown to...

Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning

In this paper, we show that the process of continually learning new task...

DP-XGBoost: Private Machine Learning at Scale

The big-data revolution announced ten years ago does not seem to have fu...

Probing the Transition to Dataset-Level Privacy in ML Models Using an Output-Specific and Data-Resolved Privacy Profile

Differential privacy (DP) is the prevailing technique for protecting use...

When Homomorphic Cryptosystem Meets Differential Privacy: Training Machine Learning Classifier with Privacy Protection

Machine learning (ML) classifiers are invaluable building blocks that ha...

Task-aware Privacy Preservation for Multi-dimensional Data

Local differential privacy (LDP), a state-of-the-art technique for priva...

Chained-DP: Can We Recycle Privacy Budget?

Privacy-preserving vector mean estimation is a crucial primitive in fede...

Please sign up or login with your details

Forgot password? Click here to reset