Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

by   Xizhou Zhu, et al.

The captivating realm of Minecraft has attracted substantial research interest in recent years, serving as a rich platform for developing intelligent agents capable of functioning in open-world environments. However, the current research landscape predominantly focuses on specific objectives, such as the popular "ObtainDiamond" task, and has not yet shown effective generalization to a broader spectrum of tasks. Furthermore, the current leading success rate for the "ObtainDiamond" task stands at around 20 Reinforcement Learning (RL) based controllers used in existing methods. To tackle these challenges, we introduce Ghost in the Minecraft (GITM), a novel framework integrates Large Language Models (LLMs) with text-based knowledge and memory, aiming to create Generally Capable Agents (GCAs) in Minecraft. These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions. We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute. The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5 robustness compared to traditional RL-based controllers. Notably, our agent is the first to procure all items in the Minecraft Overworld technology tree, demonstrating its extensive capabilities. GITM does not need any GPU for training, but a single CPU node with 32 CPU cores is enough. This research shows the potential of LLMs in developing capable agents for handling long-horizon, complex tasks and adapting to uncertainties in open-world environments. See the project website at


page 7

page 12

page 13

page 15

page 16

page 18

page 21


ScriptWorld: Text Based Environment For Learning Procedural Knowledge

Text-based games provide a framework for developing natural language und...

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Autonomous agents have made great strides in specialist domains like Ata...

WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

Recent advances in deep reinforcement learning (RL) have demonstrated co...

Augmentative Topology Agents For Open-Ended Learning

In this work, we tackle the problem of open-ended learning by introducin...

The NetHack Learning Environment

Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand wit...

You Only Look at Screens: Multimodal Chain-of-Action Agents

Autonomous user interface (UI) agents aim to facilitate task automation ...

AgentBench: Evaluating LLMs as Agents

Large Language Models (LLMs) are becoming increasingly smart and autonom...

Please sign up or login with your details

Forgot password? Click here to reset