fundamentals of reinforcement learning

By attending Fundamentals of Reinforcement Learning workshop, Participants will:

  • Formalize problems as Markov Decision Processes 
  • Understand basic exploration methods and the exploration/exploitation tradeoff
  • Understand value functions, as a general-purpose tool for optimal decision-making
  • Know how to implement dynamic programming as an efficient solution approach to an industrial control problem

Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making.

This course teaches you the key concepts of Reinforcement Learning, underlying classic and modern algorithms in RL. After completing this course, you will be able to start using RL for real problems, where you have or can specify the MDP.

COURSE AGENDA

  • Policy Evaluation vs. Control
  • Iterative Policy Evaluation
  • Policy Improvement
  • Policy Iteration
  • Flexibility of the Policy Iteration Framework
  • Efficiency of Dynamic Programming
  • Warren Powell: Approximate Dynamic Programming for Fleet Management (Short)
  • Warren Powell: Approximate Dynamic Programming for Fleet Management (Long)
  • Specifying Policies
  • Value Functions
  • Rich Sutton and Andy Barto: A brief History of RL
  • Bellman Equation Derivation
  • Why Bellman Equations?
  • Optimal Policies
  • Optimal Value Functions
  • Using Optimal Value Functions to Get Optimal Policies
  • Specialization Introduction
  • Course Introduction
  • Markov Decision Processes
  • Examples of MDPs
  • The Goal of Reinforcement Learning
  • Michael Littman: The Reward Hypothesis
  • Continuing Tasks
  • Examples of Episodic and Continuing Tasks
  • Sequential Decision Making with Evaluative Feedback
  • Learning Action Values
  • Estimating Action Values Incrementally
  • What is the trade-off?
  • Optimistic Initial Values
  • Upper-Confidence Bound (UCB) Action Selection
  • Jonathan Langford: Contextual Bandits for Real World Reinforcement Learning