Large Language Models with Controllable Working Memory

by   Daliang Li, et al.

Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. While many downstream applications provide the model with an informational context to aid its performance on the underlying task, how the model's world knowledge interacts with the factual information presented in the context remains under explored. As a desirable behavior, an LLM should give precedence to the context whenever it contains task-relevant information that conflicts with the model's memorized knowledge. This enables model predictions to be grounded in the context, which can then be used to update or correct specific model predictions without frequent retraining. By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge. In this paper, we undertake a first joint study of the aforementioned two properties, namely controllability and robustness, in the context of LLMs. We demonstrate that state-of-the-art T5 and PaLM (both pretrained and finetuned) could exhibit poor controllability and robustness, which do not scale with increasing model size. As a solution, we propose a novel method - Knowledge Aware FineTuning (KAFT) - to strengthen both controllability and robustness by incorporating counterfactual and irrelevant contexts to standard supervised datasets. Our comprehensive evaluation showcases the utility of KAFT across model architectures and sizes.


Large Language Models Can Be Easily Distracted by Irrelevant Context

Large language models have achieved impressive performance on various na...

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

Fine-tuning a pretrained transformer for a downstream task has become a ...

Extracting Biomedical Factual Knowledge Using Pretrained Language Model and Electronic Health Record Context

Language Models (LMs) have performed well on biomedical natural language...

Counterfactual reasoning: Do language models need world knowledge for causal understanding?

Current pre-trained language models have enabled remarkable improvements...

Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios

Current pre-trained language models have enabled remarkable improvements...

Sorting through the noise: Testing robustness of information processing in pre-trained language models

Pre-trained LMs have shown impressive performance on downstream NLP task...

Trapping LLM Hallucinations Using Tagged Context Prompts

Recent advances in large language models (LLMs), such as ChatGPT, have l...

Please sign up or login with your details

Forgot password? Click here to reset