ThinkSum: Probabilistic reasoning over sets using large language models

by   Batu Ozturkler, et al.

Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reproducing patterns in linear text that occur in their training data (zero-shot evaluation) or in the provided context (few-shot in-context learning). However, recent studies show that even the largest LLMs fail in scenarios that require reasoning over multiple objects or facts or making sequences of logical deductions. We propose a two-stage probabilistic inference paradigm, ThinkSum, that reasons over sets of objects or facts in a structured manner. In the first stage (Think – 'fast' retrieval of associations), a LLM is queried in parallel over a set of phrases extracted from the prompt or an auxiliary model call. In the second stage (Sum – 'slow' probabilistic inference or reasoning), the results of these queries are aggregated to make the final prediction. We demonstrate the advantages of ThinkSum on the BIG-bench suite of evaluation tasks, achieving improvements over the state of the art using GPT-family models on ten difficult tasks, often with far smaller model variants. We compare and contrast ThinkSum with other proposed modifications to direct prompting of LLMs, such as variants of chain-of-thought prompting. We argue that because the probabilistic inference in ThinkSum is performed outside of calls to the LLM, ThinkSum is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs.


page 9

page 16


Enhance Reasoning Ability of Visual-Language Models via Large Language Models

Pre-trained visual language models (VLM) have shown excellent performanc...

EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning

Large language models primarily rely on incontext learning to execute ta...

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

Large language models (LLMs) have been shown to be capable of impressive...

Boosting Theory-of-Mind Performance in Large Language Models via Prompting

Large language models (LLMs) excel in many tasks in 2023, but they still...

Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models

Probabilistic models of language understanding are interpretable and str...

Tab-CoT: Zero-shot Tabular Chain of Thought

The chain-of-though (CoT) prompting methods were successful in various n...

Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration

We identify two crucial limitations in the evaluation of recent parallel...

Please sign up or login with your details

Forgot password? Click here to reset