Learning Control by Iterative Inversion

by   Gal Leibovich, et al.

We formulate learning for control as an inverse problem – inverting a dynamical system to give the actions which yield desired behavior. The key challenge in this formulation is a distribution shift – the learning agent only observes the forward mapping (its actions' consequences) on trajectories that it can execute, yet must learn the inverse mapping for inputs-outputs that correspond to a different, desired behavior. We propose a general recipe for inverse problems with a distribution shift that we term iterative inversion – learn the inverse mapping under the current input distribution (policy), then use it on the desired output samples to obtain new inputs, and repeat. As we show, iterative inversion can converge to the desired inverse mapping, but under rather strict conditions on the mapping itself. We next apply iterative inversion to learn control. Our input is a set of demonstrations of desired behavior, given as video embeddings of trajectories, and our method iteratively learns to imitate trajectories generated by the current policy, perturbed by random exploration noise. We find that constantly adding the demonstrated trajectory embeddings as input to the policy when generating trajectories to imitate, a-la iterative inversion, steers the learning towards the desired trajectory distribution. To the best of our knowledge, this is the first exploration of learning control from the viewpoint of inverse problems, and our main advantage is simplicity – we do not require rewards, and only employ supervised learning, which easily scales to state-of-the-art trajectory embedding techniques and policy representations. With a VQ-VAE embedding, and a transformer-based policy, we demonstrate non-trivial continuous control on several tasks. We also report improved performance on imitating diverse behaviors compared to reward based methods.


page 8

page 23


Learned SVD: solving inverse problems via hybrid autoencoding

Our world is full of physics-driven data where effective mappings betwee...

Prioritized Inverse Kinematics: Desired Task Trajectories in Nonsingular Task Spaces

A prioritized inverse kinematics (PIK) solution can be considered as a (...

InverseNet: Solving Inverse Problems with Splitting Networks

We propose a new method that uses deep learning techniques to solve the ...

A preconditioned Krylov subspace method for linear inverse problems with general-form Tikhonov regularization

Tikhonov regularization is a widely used technique in solving inverse pr...

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

This paper is about the problem of learning a stochastic policy for gene...

An Inversion Tool for Conditional Term Rewriting Systems – A Case Study of Ackermann Inversion

We report on an inversion tool for a class of oriented conditional const...

Active Inverse Learning in Stackelberg Trajectory Games

Game-theoretic inverse learning is the problem of inferring the players'...

Please sign up or login with your details

Forgot password? Click here to reset