Enhance Reasoning Ability of Visual-Language Models via Large Language Models

05/22/2023
by   Yueting Yang, et al.
0

Pre-trained visual language models (VLM) have shown excellent performance in image caption tasks. However, it sometimes shows insufficient reasoning ability. In contrast, large language models (LLMs) emerge with powerful reasoning capabilities. Therefore, we propose a method called TReE, which transfers the reasoning ability of a large language model to a visual language model in zero-shot scenarios. TReE contains three stages: observation, thinking, and re-thinking. Observation stage indicates that VLM obtains the overall information of the relative image. Thinking stage combines the image information and task description as the prompt of the LLM, inference with the rationals. Re-Thinking stage learns from rationale and then inference the final result through VLM.

READ FULL TEXT
research
08/30/2023

Response: Emergent analogical reasoning in large language models

In their recent Nature Human Behaviour paper, "Emergent analogical reaso...
research
03/24/2021

Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2

Thinking aloud is an effective meta-cognitive strategy human reasoners a...
research
10/23/2022

Do Language Models Understand Measurements?

Recent success of pre-trained language models (PLMs) has stimulated inte...
research
06/30/2023

Look, Remember and Reason: Visual Reasoning with Grounded Rationales

Large language models have recently shown human level performance on a v...
research
04/28/2023

Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning

Languages models have been successfully applied to a variety of reasonin...
research
10/04/2022

ThinkSum: Probabilistic reasoning over sets using large language models

Large language models (LLMs) have a substantial capacity for high-level ...
research
05/05/2023

LMEye: An Interactive Perception Network for Large Language Models

Training a Large Visual Language Model (LVLM) from scratch, like GPT-4, ...

Please sign up or login with your details

Forgot password? Click here to reset