# Reinforcement Learning System

A Reinforcement Learning System is a online reward-maximization system that implements a reinforcement learning algorithm to solve a reinforcement learning task (to learn a policy to maximize reward from feedback data).

**Context:**- It can be based on a Sequential Decision-Making System.
- It can be developed using a Reinforcement Learning Platform.

**Example(s):**- an Apprenticeship Learning System,
- an Inverse Reinforcement Learning System,
- an Instance-Based Reinforcement Learning System,
- an Average-Reward Reinforcement Learning System,
- a Distributed Reinforcement Learning System,
- a Temporal Difference Learning System,
- a Q-Learning System,
- a SARSA System,
- a Relational Reinforcement Learning System,
- a Gaussian Process Reinforcement Learning System,
- a Hierarchical Reinforcement Learning System,
- an Associative Reinforcement Learning System,
- a Bayesian Reinforcement Learning System,
- a Radial Basis Function Network,
- a Policy Gradient Reinforcement Learning System,
- a Least Squares Reinforcement Learning System,
- an Evolutionary Reinforcement Learning System,
- a Reward Shaping System,
- a PAC-MDP Learning System.
- a Reinforcement Learning-based Recommendation System.
- a Deep Reinforcement Learning System, such as: AlphaGo.
- a CogitAI Continua SaaS Platform [1].
- …

**Counter-Example(s):****See:**Active Learning System, Online Learning System, Machine Learning System, Value Function Approximation System, Markov Decision Process.

## References

### 2017

- (Stone, 2017) ⇒ Stone P. (2017) Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA pp. 1088-1090
- QUOTE: Reinforcement Learning describes a large class of learning problems characteristic of autonomous agents interacting in an environment: sequential decision-making problems with delayed reward. Reinforcement-learning algorithms seek to learn a policy (mapping from states to actions) that maximizes the reward received over time.
Unlike in supervised learning problems, in reinforcement-learning problems, there are no labeled examples of correct and incorrect behavior. However, unlike unsupervised learning problems, a reward signal can be perceived.

- QUOTE: Reinforcement Learning describes a large class of learning problems characteristic of autonomous agents interacting in an environment: sequential decision-making problems with delayed reward. Reinforcement-learning algorithms seek to learn a policy (mapping from states to actions) that maximizes the reward received over time.

### 2017

- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Reinforcement_learning Retrieved:2017-12-24.
**Reinforcement learning**(**RL**) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take*actions*in an*environment*so as to maximize some notion of cumulative*reward*. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called*approximate dynamic programming*. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible. Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

^{[1]}The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.