Privacy Guarantees for De-identifying Text Transformations

08/07/2020
by   David Ifeoluwa Adelani, et al.
0

Machine Learning approaches to Natural Language Processing tasks benefit from a comprehensive collection of real-life user data. At the same time, there is a clear need for protecting the privacy of the users whose data is collected and processed. For text collections, such as, e.g., transcripts of voice interactions or patient records, replacing sensitive parts with benign alternatives can provide de-identification. However, how much privacy is actually guaranteed by such text transformations, and are the resulting texts still useful for machine learning? In this paper, we derive formal privacy guarantees for general text transformation-based de-identification methods on the basis of Differential Privacy. We also measure the effect that different ways of masking private information in dialog transcripts have on a subsequent machine learning task. To this end, we formulate different masking strategies and compare their privacy-utility trade-offs. In particular, we compare a simple redact approach with more sophisticated word-by-word replacement using deep learning models on multiple natural language understanding tasks like named entity recognition, intent detection, and dialog act classification. We find that only word-by-word replacement is robust against performance drops in various tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2022

User-Entity Differential Privacy in Learning Natural Language Models

In this paper, we introduce a novel concept of user-entity differential ...
research
08/04/2021

With One Voice: Composing a Travel Voice Assistant from Re-purposed Models

Voice assistants provide users a new way of interacting with digital pro...
research
11/07/2017

Quality-Efficiency Trade-offs in Machine Learning for Text Processing

Data mining, machine learning, and natural language processing are power...
research
05/25/2018

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

This paper presents the machine learning architecture of the Snips Voice...
research
06/02/2021

Differential Privacy for Text Analytics via Natural Text Sanitization

Texts convey sophisticated knowledge. However, texts also convey sensiti...
research
10/29/2020

Differential Privacy and Natural Language Processing to Generate Contextually Similar Decoy Messages in Honey Encryption Scheme

Honey Encryption is an approach to encrypt the messages using low min-en...
research
06/16/2020

Building a Collaborative Phone Blacklisting System with Local Differential Privacy

Spam phone calls have been rapidly growing from nuisance to an increasin...

Please sign up or login with your details

Forgot password? Click here to reset