Show me the Way: Intrinsic Motivation from Demonstrations

06/23/2020
by   Léonard Hussenot, et al.
0

The study of exploration in Reinforcement Learning (RL) has a long history but it remains an unsolved problem. Recent approaches applied to Deep RL are based on the concept of intrinsic motivation and are implemented in the shape of an exploration bonus, added to the environment reward, that encourages visiting exhaustively the whole state-action space as fast as possible. This approach is supported by the vast theory of RL for which convergence to optimality assumes exhaustive exploration. Yet, Human Beings and mammals do not exhaustively explore the world and their motivation is not only based on novelty but also on diverse other factors (e.g., curiosity, fun, style, pleasure, safety, competition, etc.). They optimize for life-long learning and train to learn transferable skills in playgrounds without obvious goals. They also apply innate or learned priors to save time and stay safe. For these reasons, we propose a method for learning an exploration bonus from demonstrations that could transfer these motivations to an artificial agent without explicitly modeling them. Using an inverse RL approach, we show that different exploration behaviors can be learnt and efficiently used by RL agents to solve tasks for which exhaustive exploration is prohibitive.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset