Time-Variant Variational Transfer for Value Functions
In most transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. In the context of this work, we will consider the problem of transferring value functions through a variational method when the distribution generating the tasks is time-variant, proposing a solution leveraging this temporal structure inherent to the task generating process. Moreover, by means of a finite sample analysis, the previously mentioned solution will be theoretically compared to its time-invariant version. Finally, we will provide an experimental evaluation of the proposed technique with three distinct time dynamics in three different RL environments.
READ FULL TEXT