Site Search  

Teaching » Syllabus: Reinforcement Learning and Learning Control

All downloadable documents are Adobe Acrobat PDF documents. You can obtain Acrobat for free by following the link from the Adobe Icon.

Note: This syllabus will be modified continuously to accommodate the progress and interests of the course participants!

Sept. 3Introduction to Reinforcement LearningSlides, Sutton Book Chapters 1-5
Sept. 10Function Approximation in Reinforcement Learning,
Optimal control along trajectories: LQR, LQG and DDP
Sutton Book Chapter 8, Todorov2005
Sept. 17Research on DDP and Function Approximation for RLTassa2007, Slides
Sept. 24Research on DDP and Function Approximation in RLDoya2000, Morimoto2003
Oct., 1Gaussian Processes for Reinforcement Learning,
Value function learning along trajectories (fitted Q iteration),
Least Squares Temporal Difference Methods
Deisenroth2009, Lagoudakis2002, Ernst2005
Oct.. 8Policy Gradient Methods: REINFORCE, GPOMDP, Natural GradientsWilliams1992, Sutton2000, Peters2008, Slides
Oct.. 15Research on Policy Gradient Methods, Introduction to Path Integral MethodsTedrake2005, Bagnell2003
Oct. 22Path Integral Methods for Reinforcement LearningTheodorou2010, Todorov2009, Kober2009
Oct. 29Path Integral Methods for Reinforcement Learning (continued)Slides
Nov. 5Sketch of Planned Projects, Modular Learning ControlTedrake2009, Todorov2009
Nov. 12Inverse reinforcement learningDvijotham2009, Abbeel2009, Ratliff2009
Nov. 19Dynamic Bayesian networks for reinforcement learningToussaint2006, Vlassis2009
Dec. 3Project presentations. 

Tentative Syllabus:

  • Introduction to reinforcement learning [1]
  • Dynamic programming methods [1, 2]
  • Optimal control methods [2, 3]
  • Temporal difference methods [1]
  • Q-Learning [1]
  • Problems of value-function-based RL methods
  • Function Approximation for RL [1]
  • Incremental Function Approximation Methods for RL [4, 5]
  • Least Squares Methods [6]
  • Direct Policy Learning: REINFORCE [7]
  • Modern policy gradient methods: GPOMDP and the Policy Gradient Theo-rem [8, 9]
  • Natural Policy Gradient Methods [9]
  • Prob. Reinforcement Learning with Reward Weighted Averaging [10, 11]
  • Q-Learning on Trajectories [12]
  • Path Integral Approaches to Reinforcement Learning I [13]
  • Path Integral Approaches to Reinforcement Learning II
  • Dynamic Bayesian Networks for RL [14]
  • Gaussian Processes in Reinforcement Learning [5]


  1. Sutton, R. S.;Barto, A. G. (1998). Reinforcement learning : An introduction, Adaptive computation and machine learning., pp.xviii, 322, MIT Press
    [Keywords: Reinforcement learning (Machine learning)]
      Details  BibTeX
  2. Dyer, P.;McReynolds, S. R. (1970). The computation and theory of optimal control, Academic Press
    [Keywords: dynamic programming,optimal control]
      Details  BibTeX
  3. Theodorou, E., Tassa, Y., Todorov, E. (2010). Stochastic Differential Dynamic Programming, In the proceedings of American Control Conference (ACC 2010)
    [Keywords: Stochastic Differential Dynamic Programming,Second Order Optimal Control]
      Details  PDF  BibTeX

    Morimoto, J.;Atkeson, C. A. (2003). Minimax differential dynamic programming: an application to robust biped walking, in: Becker, S.;Thrun, S.;Obermayer, K. (eds.), Advances in Neural Information Processing Systems 15, Cambridge, MA: MIT Press
    [Keywords: reinforcement learning trajectory optimization differential dynamic programming]
      Details  PDF  BibTeX
  4. Schaal, S.;Atkeson, C. G. (1998). Constructive incremental learning from only local information, Neural Computation, 10, 8, pp.2047-2084
    [Keywords: statistical learning, nonparametric regression, distance metric, incremental learning, on-line learning, supersmoothing]
      Details  PDF  BibTeX
  5. Rasmussen, C. E.;Williams, C. K. I. (2006). Gaussian processes for machine learning, Adaptive computation and machine learning, pp.xviii, 248 p., MIT Press
    [Keywords: Gaussian processes Data processing. Machine learning Mathematical models.]
      Details  PDF  BibTeX
  6. J. Boyan, "Least-squares temporal difference learning," in In Proceedings of the Sixteenth International Conference on Machine Learning: Morgan Kaufmann, 1999, pp. 49-56.

  7. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 8, pp.229-256
    [Keywords: stochastic reinforcement learning, non delayed]
      Details  BibTeX
  8. Peters, J.;Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, 4, pp.682-97
    [Keywords: Reinforcement learning, Policy gradient methods, Natural gradients, Natural Actor-Critic, Motor skills, Motor primitives]
      Details  PDF  BibTeX
  9. Kober, J.; Peters, J. (2011). Policy Search for Motor Primitives in Robotics, Machine Learning, 84, 1-2, pp.171-203
      Details  PDF  BibTeX

    Kober, J.; Peters, J. (2009). Policy Search for Motor Primitives in Robotics, Advances in Neural Information Processing Systems 22 (NIPS 2008), Cambridge, MA: MIT Press
      Details  PDF  BibTeX
  10. Neumann, G.; Peters, J. (2009). Fitted Q-iteration by Advantage Weighted Regression, Advances in Neural Information Processing Systems 22 (NIPS 2008), Cambridge, MA: MIT Press
      Details  PDF  BibTeX
  11. Toussaint, M.;Storkey, A. (2006). Probabilistic inference for solving discrete and continuous state Markov Decision Processes, 23nd International Conference on Machine Learning (ICML 2006)
      Details  BibTeX
Designed by: Nerses Ohanyan & Jan Peters
Page last modified on January 13, 2012, at 02:38 AM