Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems

by   Matilde Santos, et al.

In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A* can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a model-free online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significantly overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior.


page 1

page 2

page 3

page 4


Thinking Fast and Slow with Deep Learning and Tree Search

Sequential decision making problems, such as structured prediction, robo...

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

Planning and reinforcement learning are two key approaches to sequential...

Optimizing Memory Mapping Using Deep Reinforcement Learning

Resource scheduling and allocation is a critical component of many high ...

Dual policy as self-model for planning

Planning is a data efficient decision-making strategy where an agent sel...

Model-Free Episodic Control with State Aggregation

Episodic control provides a highly sample-efficient method for reinforce...

Integrating Acting, Planning and Learning in Hierarchical Operational Models

We present new planning and learning algorithms for RAE, the Refinement ...

Defeasible Decisions: What the Proposal is and isn't

In two recent papers, I have proposed a description of decision analysis...

Please sign up or login with your details

Forgot password? Click here to reset