Safe Reinforcement Learning for Legged Locomotion

by   Tsung-Yen Yang, et al.

Designing control policies for legged locomotion is complex due to the under-actuated and non-continuous robot dynamics. Model-free reinforcement learning provides promising tools to tackle this challenge. However, a major bottleneck of applying model-free reinforcement learning in real world is safety. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy that prevents the robot from entering unsafe states, and a learner policy that is optimized to complete the task. The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally intervening in the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance. We verify the proposed framework in four locomotion tasks on a simulated and real quadrupedal robot: efficient gait, catwalk, two-leg balance, and pacing. On average, our method achieves 48.6 fewer falls and comparable or better rewards than the baseline methods in simulation. When deployed it on real-world quadruped robot, our training pipeline enables 34 40.9 duration in the two-leg balance. Our method achieves less than five falls over the duration of 115 minutes of hardware time.


Bayesian Optimization Meets Hybrid Zero Dynamics: Safe Parameter Learning for Bipedal Locomotion Control

In this paper, we propose a multi-domain control parameter learning fram...

Data Efficient Reinforcement Learning for Legged Robots

We present a model-based framework for robot locomotion that achieves wa...

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

The dynamic response of the legged robot locomotion is non-Lipschitz and...

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous...

SafeSteps: Learning Safer Footstep Planning Policies for Legged Robots via Model-Based Priors

We present a footstep planning policy for quadrupedal locomotion that is...

Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion

Several earlier studies have shown impressive control performance in com...

Model-Free Error Detection and Recovery for Robot Learning from Demonstration

Learning from human demonstrations can facilitate automation but is risk...

Please sign up or login with your details

Forgot password? Click here to reset