User-Centered Security in Natural Language Processing

01/10/2023
by   Chris Emmery, et al.
0

This dissertation proposes a framework of user-centered security in Natural Language Processing (NLP), and demonstrates how it can improve the accessibility of related research. Accordingly, it focuses on two security domains within NLP with great public interest. First, that of author profiling, which can be employed to compromise online privacy through invasive inferences. Without access and detailed insight into these models' predictions, there is no reasonable heuristic by which Internet users might defend themselves from such inferences. Secondly, that of cyberbullying detection, which by default presupposes a centralized implementation; i.e., content moderation across social platforms. As access to appropriate data is restricted, and the nature of the task rapidly evolves (both through lexical variation, and cultural shifts), the effectiveness of its classifiers is greatly diminished and thereby often misrepresented. Under the proposed framework, we predominantly investigate the use of adversarial attacks on language; i.e., changing a given input (generating adversarial samples) such that a given model does not function as intended. These attacks form a common thread between our user-centered security problems; they are highly relevant for privacy-preserving obfuscation methods against author profiling, and adversarial samples might also prove useful to assess the influence of lexical variation and augmentation on cyberbullying detection.

READ FULL TEXT

page 1

page 35

research
01/27/2021

Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

Written language contains stylistic cues that can be exploited to automa...
research
06/21/2023

Sample Attackability in Natural Language Adversarial Attacks

Adversarial attack research in natural language processing (NLP) has mad...
research
07/02/2018

The Interplay between Lexical Resources and Natural Language Processing

Incorporating linguistic, world and common sense knowledge into AI/NLP s...
research
10/12/2020

From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks

Adversarial attacks are label-preserving modifications to inputs of mach...
research
04/10/2022

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learn...
research
08/27/2023

Detecting Language Model Attacks with Perplexity

A novel hack involving Large Language Models (LLMs) has emerged, leverag...

Please sign up or login with your details

Forgot password? Click here to reset