Optimal Differentially Private Learning with Public Data

by   Andrew Lowy, et al.

Differential Privacy (DP) ensures that training a machine learning model does not leak private data. However, the cost of DP is lower model accuracy or higher sample complexity. In practice, we may have access to auxiliary public data that is free of privacy concerns. This has motivated the recent study of what role public data might play in improving the accuracy of DP models. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set while having access to side public data? What algorithms are optimal? 2. How can we harness public data to improve DP model training in practice? We consider these questions in both the local and central models of DP. To answer the first question, we prove tight (up to constant factors) lower and upper bounds that characterize the optimal error rates of three fundamental problems: mean estimation, empirical risk minimization, and stochastic convex optimization. We prove that public data reduces the sample complexity of DP model training. Perhaps surprisingly, we show that the optimal error rates can be attained (up to constants) by either discarding private data and training a public model, or treating public data like it's private data and using an optimal DP algorithm. To address the second question, we develop novel algorithms which are "even more optimal" (i.e. better constants) than the asymptotically optimal approaches described above. For local DP mean estimation with public data, our algorithm is optimal including constants. Empirically, our algorithms show benefits over existing approaches for DP model training with side access to public data.


page 1

page 2

page 3

page 4


Private Estimation with Public Data

We initiate the study of differentially private (DP) estimation with acc...

Differentially Private Optimization on Large Model at Small Cost

Differentially private (DP) optimization is the standard paradigm to lea...

Differentially Private Adaptive Optimization with Delayed Preconditioners

Privacy noise may negate the benefits of using adaptive optimizers in di...

Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting Codes

Fingerprinting arguments, first introduced by Bun, Ullman, and Vadhan (S...

Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

Differentially private SGD (DP-SGD) is one of the most popular methods f...

Differentially Private Synthetic Data via Foundation Model APIs 1: Images

Generating differentially private (DP) synthetic data that closely resem...

DP-PCA: Statistically Optimal and Differentially Private PCA

We study the canonical statistical task of computing the principal compo...

Please sign up or login with your details

Forgot password? Click here to reset