Quality-Diversity Optimisation on a Physical Robot Through Dynamics-Aware and Reset-Free Learning

by   Simón C. Smith, et al.

Learning algorithms, like Quality-Diversity (QD), can be used to acquire repertoires of diverse robotics skills. This learning is commonly done via computer simulation due to the large number of evaluations required. However, training in a virtual environment generates a gap between simulation and reality. Here, we build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot. This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour and improve sample efficiency. A behaviour selection policy filters out uninteresting or unsafe policies predicted by the model. RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning. We demonstrate that our method enables a physical quadruped robot to learn a repertoire of behaviours in two hours without human supervision. We successfully test the solution repertoire using a maze navigation task. Finally, we compare our approach to the MAP-Elites algorithm. We show that dynamics awareness and a recovery policy are required for training on a physical robot for optimal archive generation. Video available at https://youtu.be/BgGNvIsRh7Q


page 1

page 4


Learning to Walk Autonomously via Reset-Free Quality-Diversity

Quality-Diversity (QD) algorithms can discover large and complex behavio...

Learning Navigation Skills for Legged Robots with Learned Robot Embeddings

Navigation policies are commonly learned on idealized cylinder agents in...

Online Damage Recovery for Physical Robots with Hierarchical Quality-Diversity

In real-world environments, robots need to be resilient to damages and r...

Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play

Training robots with physical bodies requires developing new methods and...

The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic Systems

Learning algorithms have shown considerable prowess in simulation by all...

Hierarchical Quality-Diversity for Online Damage Recovery

Adaptation capabilities, like damage recovery, are crucial for the deplo...

QED: using Quality-Environment-Diversity to evolve resilient robot swarms

In swarm robotics, any of the robots in a swarm may be affected by diffe...

Please sign up or login with your details

Forgot password? Click here to reset