Multimodal Pretrained Models for Sequential Decision-Making: Synthesis, Verification, Grounding, and Perception

by   Yunhao Yang, et al.

Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to ground these controllers to task environments through visual observations. In particular, the algorithm queries a pretrained model with a user-provided, text-based task description and uses the model's output to construct an automaton-based controller that encodes the model's task-relevant knowledge. It then verifies whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. If this verification step discovers any inconsistency, the algorithm automatically refines the controller to resolve the inconsistency. Next, the algorithm leverages the vision and language capabilities of pretrained models to ground the controller to the task environment. It collects image-based observations from the task environment and uses the pretrained model to link these observations to the text-based control logic encoded in the controller (e.g., actions and conditions that trigger the actions). We propose a mechanism to ensure the controller satisfies the user-provided specification even when perceptual uncertainties are present. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks, including daily life and robot manipulation tasks.


page 2

page 12

page 16

page 17

page 19

page 22


Learning Universal Policies via Text-Guided Video Generation

A goal of artificial intelligence is to construct an agent that can solv...

Learning Automata-Based Task Knowledge Representation from Large-Scale Generative Language Models

Automata-based representations play an important role in control and pla...

Search and Explore: Symbiotic Policy Synthesis in POMDPs

This paper marries two state-of-the-art controller synthesis methods for...

Resolving Ambiguity via Dialogue to Correct Unsynthesizable Controllers for Free-Flying Robots

In situations such as habitat construction, station inspection, or coope...

Synthesis of Run-To-Completion Controllers for Discrete Event Systems

A controller for a Discrete Event System must achieve its goals despite ...

Blackbox End-to-End Verification of Ground Robot Safety and Liveness

We formally prove end-to-end correctness of a ground robot implemented i...

Decision Making Problems with Funnel Structure: A Multi-Task Learning Approach with Application to Email Marketing Campaigns

This paper studies the decision making problem with Funnel Structure. Fu...

Please sign up or login with your details

Forgot password? Click here to reset