Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization

by   Homayoon Farrahi, et al.

Continuous-time reinforcement learning tasks commonly use discrete steps of fixed cycle times for actions. As practitioners need to choose the action-cycle time for a given task, a significant concern is whether the hyper-parameters of the learning algorithm need to be re-tuned for each choice of the cycle time, which is prohibitive for real-world robotics. In this work, we investigate the widely-used baseline hyper-parameter values of two policy gradient algorithms – PPO and SAC – across different cycle times. Using a benchmark task where the baseline hyper-parameters of both algorithms were shown to work well, we reveal that when a cycle time different than the task default is chosen, PPO with baseline hyper-parameters fails to learn. Moreover, both PPO and SAC with their baseline hyper-parameters perform substantially worse than their tuned values for each cycle time. We propose novel approaches for setting these hyper-parameters based on the cycle time. In our experiments on simulated and real-world robotic tasks, the proposed approaches performed at least as well as the baseline hyper-parameters, with significantly better performance for most choices of the cycle time, and did not result in learning failure for any cycle time. Hyper-parameter tuning still remains a significant barrier for real-world robotics, as our approaches require some initial tuning on a new task, even though it is negligible compared to an extensive tuning for each cycle time. Our approach requires no additional tuning after the cycle time is changed for a given task and is a step toward avoiding extensive and costly hyper-parameter tuning for real-world policy optimization.


page 5

page 7

page 13

page 14

page 15

page 16

page 17

page 18


On Hyper-parameter Tuning for Stochastic Optimization Algorithms

This paper proposes the first-ever algorithmic framework for tuning hype...

Hyper-Parameter Auto-Tuning for Sparse Bayesian Learning

Choosing the values of hyper-parameters in sparse Bayesian learning (SBL...

Making a Science of Model Search

Many computer vision algorithms depend on a variety of parameter choices...

Learning to Do or Learning While Doing: Reinforcement Learning and Bayesian Optimisation for Online Continuous Tuning

Online tuning of real-world plants is a complex optimisation problem tha...

Adaptive Structural Hyper-Parameter Configuration by Q-Learning

Tuning hyper-parameters for evolutionary algorithms is an important issu...

Hyper-parameter Tuning under a Budget Constraint

We study a budgeted hyper-parameter tuning problem, where we optimize th...

Benchmarking Reinforcement Learning Algorithms on Real-World Robots

Through many recent successes in simulation, model-free reinforcement le...

Please sign up or login with your details

Forgot password? Click here to reset