Offline Reinforcement Learning

Guy Tennenholtz

Abstract: Offline reinforcement learning (offline RL), a.k.a. batch-mode reinforcement learning, involves learning a policy from potentially suboptimal data. In contrast to imitation learning, offline RL does not rely on expert demonstrations, but rather seeks to surpass the average performance of the agents that generated the data. Methodologies such as the gathering of new experience fall short in offline settings, requiring reassessment of fundamental learning paradigms. In this tutorial I aim to provide the necessary background and challenges of this exciting area of research, from off policy evaluation through bandits to deep reinforcement learning.

Bio: Guy Tennenholtz is a fourth-year Ph.D. student at the Technion University, advised by Prof. Shie Mannor. His research interests lie in the field of reinforcement learning, and specifically, how offline data can be leveraged to build better agents. Problems of large action spaces, partial observability, confounding bias, and uncertainty are only some of the problems he is actively researching. In his spare time Guy also enjoys creating mobile games, with the vision of incorporating AI into both the game development process and gameplay.