ALIGNEDLATENTMODELS.GITHUB.IO
Updated 329 days ago
While reinforcement learning (RL) methods that learn an internal model of the environment have the potential to be more sample efficient than their model-free counterparts, learning to model raw observations from a high dimensional sensors can be challenging. To address this, prior work has instead learned low-dimensional representation of observations, through auxiliary objectives like reconstruction or value prediction, where their precise alignment with the RL objective often remains unclear. In this work, we propose a single objective for jointly optimizing these representations together with a model and a policy and show that this is an evidence lower bound on expected returns...
Simplifying Model-based RL: Learning Representations, Latent-space Models and Policies with One Objective