Analyzing Leakage of Personally Identifiable Information in Language Models

by   Nils Lukas, et al.

Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scrubbing techniques reduce but do not prevent the risk of PII leakage: in practice scrubbing is imperfect and must balance the trade-off between minimizing disclosure and preserving the utility of the dataset. On the other hand, it is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee sentence- or user-level privacy, prevent PII disclosure. In this work, we propose (i) a taxonomy of PII leakage in LMs, (ii) metrics to quantify PII leakage, and (iii) attacks showing that PII leakage is a threat in practice. Our taxonomy provides rigorous game-based definitions for PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LM. We empirically evaluate attacks against GPT-2 models fine-tuned on three domains: case law, health care, and e-mails. Our main contributions are (i) novel attacks that can extract up to 10 times more PII sequences as existing attacks, (ii) showing that sentence-level differential privacy reduces the risk of PII disclosure but still leaks about 3 record-level membership inference and PII reconstruction.


page 1

page 9

page 12

page 18


Bounding Training Data Reconstruction in Private (Deep) Learning

Differential privacy is widely accepted as the de facto method for preve...

Membership Inference Attack Susceptibility of Clinical Language Models

Deep Neural Network (DNN) models have been shown to have high empirical ...

Defending against Reconstruction Attacks with Rényi Differential Privacy

Reconstruction attacks allow an adversary to regenerate data samples of ...

On the privacy-utility trade-off in differentially private hierarchical text classification

Hierarchical models for text classification can leak sensitive or confid...

Do Not Give Away My Secrets: Uncovering the Privacy Issue of Neural Code Completion Tools

Neural Code Completion Tools (NCCTs) have reshaped the field of software...

ProPILE: Probing Privacy Leakage in Large Language Models

The rapid advancement and widespread use of large language models (LLMs)...

What can we learn from Data Leakage and Unlearning for Law?

Large Language Models (LLMs) have a privacy concern because they memoriz...

Please sign up or login with your details

Forgot password? Click here to reset