You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

04/26/2022
by   Haoran Li, et al.
0

Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained language models. Despite the huge progress, privacy concerns have arisen recently: training data of large language models can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by language modeling which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6 language models' powerful generation ability.

READ FULL TEXT
research
12/31/2020

KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records

Nowadays, mainstream natural language pro-cessing (NLP) is empowered by ...
research
01/14/2021

Privacy Analysis in Language Models via Training Data Leakage Report

Recent advances in neural network based language models lead to successf...
research
09/21/2023

Knowledge Sanitization of Large Language Models

We explore a knowledge sanitization approach to mitigate the privacy con...
research
04/11/2023

Multi-step Jailbreaking Privacy Attacks on ChatGPT

With the rapid progress of large language models (LLMs), many downstream...
research
01/04/2022

Submix: Practical Private Prediction for Large-Scale Language Models

Recent data-extraction attacks have exposed that language models can mem...
research
05/22/2023

Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage

The advancement of large language models (LLMs) brings notable improveme...
research
10/31/2022

Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

Studying data memorization in neural language models helps us understand...

Please sign up or login with your details

Forgot password? Click here to reset