ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans

by   Anna Lukina, et al.

We introduce ARES, an efficient approximation algorithm for generating optimal plans (action sequences) that take an initial state of a Markov Decision Process (MDP) to a state whose cost is below a specified (convergence) threshold. ARES uses Particle Swarm Optimization, with adaptive sizing for both the receding horizon and the particle swarm. Inspired by Importance Splitting, the length of the horizon and the number of particles are chosen such that at least one particle reaches a next-level state, that is, a state where the cost decreases by a required delta from the previous-level state. The level relation on states and the plans constructed by ARES implicitly define a Lyapunov function and an optimal policy, respectively, both of which could be explicitly generated by applying ARES to all states of the MDP, up to some topological equivalence relation. We also assess the effectiveness of ARES by statistically evaluating its rate of success in generating optimal plans. The ARES algorithm resulted from our desire to clarify if flying in V-formation is a flocking policy that optimizes energy conservation, clear view, and velocity alignment. That is, we were interested to see if one could find optimal plans that bring a flock from an arbitrary initial state to a state exhibiting a single connected V-formation. For flocks with 7 birds, ARES is able to generate a plan that leads to a V-formation in 95 63 seconds, on average. ARES can also be easily customized into a model-predictive controller (MPC) with an adaptive receding horizon and statistical guarantees of convergence. To the best of our knowledge, our adaptive-sizing approach is the first to provide convergence guarantees in receding-horizon techniques.


page 1

page 2

page 3

page 4


V-Formation via Model Predictive Control

We present recent results that demonstrate the power of viewing the prob...

Robust Batch Policy Learning in Markov Decision Processes

We study the sequential decision making problem in Markov decision proce...

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We study reinforcement learning in an infinite-horizon average-reward se...

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Constrained Markov Decision Processes (CMDPs) formalize sequential decis...

Formation control with connectivity assurance for missile swarm: a natural co-evolutionary strategy approach

Formation control problem is one of the most concerned topics within the...

Recurrent Model Predictive Control

This paper proposes an off-line algorithm, called Recurrent Model Predic...

Estimating action plans for smart poultry houses

In poultry farming, the systematic choice, update, and implementation of...

Please sign up or login with your details

Forgot password? Click here to reset