Joint Representation Training in Sequential Tasks with Shared Structure

by   Aldo Pacchiano, et al.

Classical theory in reinforcement learning (RL) predominantly focuses on the single task setting, where an agent learns to solve a task through trial-and-error experience, given access to data only from that task. However, many recent empirical works have demonstrated the significant practical benefits of leveraging a joint representation trained across multiple, related tasks. In this work we theoretically analyze such a setting, formalizing the concept of task relatedness as a shared state-action representation that admits linear dynamics in all the tasks. We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL. In the presence of P episodic tasks of dimension d sharing a joint r ≪ d low-dimensional representation, we show the regret on the the P tasks can be improved from O(PHd√(NH)) to O((Hd√(rP) + HP√(rd))√(NH)) over N episodes of horizon H. These gains coincide with those observed in other linear models in contextual bandits and RL. In contrast with previous work that have studied multi task RL in other function approximation models, we show that in the presence of bilinear optimization oracle and finite state action spaces there exists a computationally efficient algorithm for multitask MatrixRL via a reduction to quadratic programming. We also develop a simple technique to shave off a √(H) factor from the regret upper bounds of some episodic linear problems.


page 1

page 2

page 3

page 4


Near-optimal Representation Learning for Linear Bandits and Linear RL

This paper studies representation learning for multi-task linear bandits...

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

We give novel algorithms for multi-task and lifelong linear bandits with...

Multi-task Representation Learning with Stochastic Linear Bandits

We study the problem of transfer-learning in the setting of stochastic l...

Provable General Function Class Representation Learning in Multitask Bandits and MDPs

While multitask representation learning has become a popular approach in...

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

We study the problem of planning under model uncertainty in an online me...

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization s...

Provable Pathways: Learning Multiple Tasks over Multiple Paths

Constructing useful representations across a large number of tasks is a ...

Please sign up or login with your details

Forgot password? Click here to reset