LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

by   Kaiyu Yang, et al.

Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection: a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 96,962 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.


page 23

page 24

page 25

page 26

page 27

page 28

page 30

page 32


Proving Parikh's theorem using Chomsky-Schutzenberger theorem

Parikh theorem was originally stated and proved by Rohkit Parikh in MIT ...

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

In learning-assisted theorem proving, one of the most critical challenge...

Autoformalization with Large Language Models

Autoformalization is the process of automatically translating from natur...

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

In theorem proving, the task of selecting useful premises from a large l...

Proof Artifact Co-training for Theorem Proving with Language Models

Labeled data for imitation learning of theorem proving in large librarie...

Learning to Prove from Synthetic Theorems

A major challenge in applying machine learning to automated theorem prov...

Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving

Large language models (LLMs) present an intriguing avenue of exploration...

Please sign up or login with your details

Forgot password? Click here to reset