TART: A plug-and-play Transformer module for task-agnostic reasoning

06/13/2023
∙
by   Kush Bhatia, et al.
∙
26
∙

Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same examples. While most existing approaches (e.g., prompt engineering) focus on the LLM's learned representations to patch this performance gap, our analysis actually reveal that LLM representations contain sufficient information to make good predictions. As such, we focus on the LLM's reasoning abilities and demonstrate that this performance gap exists due to their inability to perform simple probabilistic reasoning tasks. This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and propose TART which generically improves an LLM's reasoning abilities using a synthetically trained Transformer-based reasoning module. TART trains this reasoning module in a task-agnostic manner using only synthetic logistic regression tasks and composes it with an arbitrary real-world pre-trained model without any additional training. With a single inference module, TART improves performance across different model families (GPT-Neo, Pythia, BLOOM), model sizes (100M - 6B), tasks (14 NLP binary classification tasks), and even across different modalities (audio and vision). Additionally, on the RAFT Benchmark, TART improves GPT-Neo (125M)'s performance such that it outperforms BLOOM (176B), and is within 4 https://github.com/HazyResearch/TART .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 11/01/2022

Preserving In-Context Learning ability in Large Language Model Fine-tuning

Pretrained large language models (LLMs) are strong in-context learners t...
research
∙ 05/28/2020

Language Models are Few-Shot Learners

Recent work has demonstrated substantial gains on many NLP tasks and ben...
research
∙ 12/31/2019

oLMpics – On what Language Model Pre-training Captures

Recent success of pre-trained language models (LMs) has spurred widespre...
research
∙ 03/22/2023

Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs

The potential of large language models (LLMs) to reason like humans has ...
research
∙ 11/17/2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific...
research
∙ 04/25/2022

Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks

Pre-trained language models have shown excellent results in few-shot lea...
research
∙ 07/04/2022

Factorizing Knowledge in Neural Networks

In this paper, we explore a novel and ambitious knowledge-transfer task,...

Please sign up or login with your details

Forgot password? Click here to reset