Are Large Language Models Really Robust to Word-Level Perturbations?

by   Haoyu Wang, et al.

The swift advancement in the scale and capabilities of Large Language Models (LLMs) positions them as promising tools for a variety of downstream tasks. In addition to the pursuit of better performance and the avoidance of violent feedback on a certain prompt, to ensure the responsibility of the LLM, much attention is drawn to the robustness of LLMs. However, existing evaluation methods mostly rely on traditional question answering datasets with predefined supervised labels, which do not align with the superior generation capabilities of contemporary LLMs. To address this issue, we propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools to evaluate the robustness of LLMs, which we refer to as the Reward Model for Reasonable Robustness Evaluation (TREvaL). Our extensive empirical experiments have demonstrated that TREval provides an accurate method for evaluating the robustness of an LLM, especially when faced with more challenging open questions. Furthermore, our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations, which are commonplace in daily language usage. Notably, we were surprised to discover that robustness tends to decrease as fine-tuning (SFT and RLHF) is conducted. The code of TREval is available in


page 3

page 11


ROSE: Robust Selective Fine-tuning for Pre-trained Language Models

Even though the large-scale language models have achieved excellent perf...

Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Prompt-based learning paradigm bridges the gap between pre-training and ...

Evaluating the Robustness of Discrete Prompts

Discrete prompts have been used for fine-tuning Pre-trained Language Mod...

Making Language Models Better Tool Learners with Execution Feedback

Tools serve as pivotal interfaces that enable humans to understand and r...

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

Recently, the community has witnessed the advancement of Large Language ...

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations

This paper reexamines the research on out-of-distribution (OOD) robustne...

Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve

We find a surprising connection between multitask learning and robustnes...

Please sign up or login with your details

Forgot password? Click here to reset