We provide a first proof of the convergence of quantile TD-learning, a distributional RL algorithm that drives multiple recent empirical breakthroughs... In deep RL practices, we estimate discounted value functions with a small discount factors, yet..
Relevance: 11.727366