When Does Differentially Private Learning Not Suffer in High Dimensions?

07/01/2022
by   Xuechen Li, et al.
0

Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term restricted Lipschitz continuity and derive improved bounds for the excess empirical and population risks that are dimension-independent under additional conditions. We empirically show that in private fine-tuning of large language models, gradients evaluated near a local optimum are mostly controlled by a few principal components. This behavior is similar to conditions under which we obtain dimension-independent bounds in convex settings. Our theoretical and empirical results together provide a possible explanation for recent successes in large-scale private fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2021

Differentially Private Fine-tuning of Language Models

We give simpler, sparser, and faster algorithms for differentially priva...
research
10/12/2021

Large Language Models Can Be Strong Differentially Private Learners

Differentially Private (DP) learning has seen limited success for buildi...
research
06/03/2022

Differentially Private Model Compression

Recent papers have shown that large pre-trained language models (LLMs) s...
research
05/26/2022

Differentially Private Decoding in Large Language Models

Recent large-scale natural language processing (NLP) systems use a pre-t...
research
07/12/2012

Near-Optimal Algorithms for Differentially-Private Principal Components

Principal components analysis (PCA) is a standard tool for identifying g...
research
09/03/2019

Differentially Private Objective Perturbation: Beyond Smoothness and Convexity

One of the most effective algorithms for differentially private learning...
research
11/06/2020

Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning

The Private Aggregation of Teacher Ensembles (PATE) framework is one of ...

Please sign up or login with your details

Forgot password? Click here to reset