Generalization on the Unseen, Logic Reasoning and Degree Curriculum

by   Emmanuel Abbe, et al.

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator (MDI) is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky MDIs. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.


page 1

page 2

page 3

page 4


Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures

This paper considers the Pointer Value Retrieval (PVR) benchmark introdu...

Logic Diffusion for Knowledge Graph Reasoning

Most recent works focus on answering first order logical queries to expl...

A Note on "Assessing Generalization of SGD via Disagreement"

Jiang et al. (2021) give empirical evidence that the average test error ...

An Optimization Framework for Task Sequencing in Curriculum Learning

Curriculum learning is gaining popularity in (deep) reinforcement learni...

Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Out-of-distribution generalization (OODG) is a longstanding challenge fo...

Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs

Experimental results have shown that curriculum learning, i.e., presenti...

Reptile: a Scalable Metalearning Algorithm

This paper considers metalearning problems, where there is a distributio...

Please sign up or login with your details

Forgot password? Click here to reset