Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition

by   Jonah Siekmann, et al.

We study the problem of realizing the full spectrum of bipedal locomotion on a real robot with sim-to-real reinforcement learning (RL). A key challenge of learning legged locomotion is describing different gaits, via reward functions, in a way that is intuitive for the designer and specific enough to reliably learn the gait across different initial random seeds or hyperparameters. A common approach is to use reference motions (e.g. trajectories of joint positions) to guide learning. However, finding high-quality reference motions can be difficult and the trajectories themselves narrowly constrain the space of learned motion. At the other extreme, reference-free reward functions are often underspecified (e.g. move forward) leading to massive variance in policy behavior, or are the product of significant reward-shaping via trial-and-error, making them exclusive to specific gaits. In this work, we propose a reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities. We instantiate this framework to define a parametric reward function with intuitive settings for all common bipedal gaits - standing, walking, hopping, running, and skipping. Using this function we demonstrate successful sim-to-real transfer of the learned gaits to the bipedal robot Cassie, as well as a generic policy that can transition between all of the two-beat gaits.


page 1

page 2

page 4


Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Training a high-dimensional simulated agent with an under-specified rewa...

Gait Library Synthesis for Quadruped Robots via Augmented Random Search

In this paper, with a view toward fast deployment of learned locomotion ...

Optimizing Bipedal Maneuvers of Single Rigid-Body Models for Reinforcement Learning

In this work, we propose a method to generate reduced-order model refere...

Custom Sine Waves Are Enough for Imitation Learning of Bipedal Gaits with Different Styles

Not until recently, robust bipedal locomotion has been achieved through ...

OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors

Reinforcement Learning (RL) has seen many recent successes for quadruped...

Learning a Single Policy for Diverse Behaviors on a Quadrupedal Robot using Scalable Motion Imitation

Learning various motor skills for quadrupedal robots is a challenging pr...

Emergence of Locomotion Behaviours in Rich Environments

The reinforcement learning paradigm allows, in principle, for complex be...

Please sign up or login with your details

Forgot password? Click here to reset