Reinforcement Learning Research Field

AKA: Reinforcement Learning Discipline.
See: John Tsitsiklis, Machine Learning, Software Agent, Action Selection, Supervised Learning, Unsupervised Learning, Markov Decision Process, Dynamic Programming, Dimitri Bertsekas.

References

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reinforcement_learning Retrieved:2019-5-9.
- Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Reinforcement learning is considered as one of three machine learning paradigms, alongside supervised learning and unsupervised learning.
  It differs from supervised learning in that labelled input/output pairs need not be presented, and sub-optimal actions need not be explicitly corrected. Instead the focus is finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).^[1]
  The environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques.^[2] ^[3] The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the MDP and they target large MDPs where exact methods become infeasible.

↑ Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. (1996). "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research. 4: 237–285. arXiv:cs/9605103. doi:10.1613/jair.301. Archived from the original on 2001-11-20.
↑ Dimitri P. Bertsekas and John N. Tsitsiklis. "Neuro-Dynamic Programming", Athena Scientific, 1996,[1]
↑ Dimitri P. Bertsekas. "Dynamic Programming and Optimal Control: Approximate Dynamic Programming, Vol.II", Athena Scientific, 2012,[2]