Teaching Arithmetic to Small Transformers

07/07/2023
by   Nayoung Lee, et al.
0

Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities.

READ FULL TEXT

page 12

page 20

page 28

page 38

page 39

research
08/02/2023

Arithmetic with Language Models: from Memorization to Computation

A better understanding of the emergent computation and problem-solving c...
research
01/31/2023

Numeracy from Literacy: Data Science as an Emergent Skill from Large Language Models

Large language models (LLM) such as OpenAI's ChatGPT and GPT-3 offer uni...
research
09/13/2023

Auto-Regressive Next-Token Predictors are Universal Learners

Large language models display remarkable capabilities in logical and mat...
research
06/27/2023

Length Generalization in Arithmetic Transformers

We examine how transformers cope with two challenges: learning basic int...
research
06/03/2021

When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations

Vision Transformers (ViTs) and MLPs signal further efforts on replacing ...
research
09/06/2023

GPT Can Solve Mathematical Problems Without a Calculator

Previous studies have typically assumed that large language models are u...
research
07/06/2022

Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation

Mathematical reasoning is one of the most impressive achievements of hum...

Please sign up or login with your details

Forgot password? Click here to reset