The Poison of Alignment

08/25/2023
by   Aibek Bekbayev, et al.
0

From the perspective of content safety issues, alignment has shown to limit large language models' (LLMs) harmful content generation. This intentional method of reinforcing models to not respond to certain user inputs seem to be present in many modern open-source instruction tuning datasets such as OpenAssistant or Guanaco. We introduce a novel insight to an instruction-tuned model's performance affected by the presence of alignment in supervised fine-tuning dataset. To be specific, we noticed that alignment acts as if it is poisoning the instruction dataset. Experimentally, we demonstrate that aligned answers significantly worsen the performance of the resulting fine-tuned model's on various reasoning benchmarks such as Big Bench (BBH), Massive Multitask Language Understanding (MMLU), Human Eval, and Discrete Reasoning Over Paragraphs (DROP), performing worse than the counterpart tuned without alignment by 4-33

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2023

AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

Instruction fine-tuned language models on a collection of instruction an...
research
07/29/2023

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

A key technology for the development of large language models (LLMs) inv...
research
12/20/2022

Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective

Are large language models (LLMs) like GPT-3 psychologically safe? In thi...
research
09/05/2023

Making Large Language Models Better Reasoners with Alignment

Reasoning is a cognitive process of using evidence to reach a sound conc...
research
05/01/2023

Poisoning Language Models During Instruction Tuning

Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetun...
research
09/11/2023

Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

Readability metrics and standards such as Flesch Kincaid Grade Level (FK...
research
08/18/2023

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

Larger language models (LLMs) have taken the world by storm with their m...

Please sign up or login with your details

Forgot password? Click here to reset