DeepAI AI Chat
Log In Sign Up

ThoughtSource: A central hub for large language model reasoning data

by   Simon Ott, et al.

Large language models (LLMs) such as GPT-3 and ChatGPT have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present the first release of ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates six scientific/medical, three general-domain and five math word question answering datasets.


Chain of Thought Prompting Elicits Reasoning in Large Language Models

Although scaling up language model size has reliably improved performanc...

Unlocking Temporal Question Answering for Large Language Models Using Code Execution

Large language models (LLMs) have made significant progress in natural l...

An automatically discovered chain-of-thought prompt generalizes to novel models and datasets

Emergent chain-of-thought (CoT) reasoning capabilities promise to improv...

Chain-Of-Thought Prompting Under Streaming Batch: A Case Study

Recently, Large Language Models (LLMs) have demonstrated remarkable capa...

Large Language Model Programs

In recent years, large pre-trained language models (LLMs) have demonstra...

Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

Error prediction in large language models often relies on domain-specifi...

Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA

Large Language Models (LLMs) have shown outstanding performance across w...