Model-Free Reinforcement Learning Algorithm: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
(→2020) |
||
Line 32: | Line 32: | ||
| TRPO || [[Trust Region Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage | | TRPO || [[Trust Region Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage | ||
|- | |- | ||
| [[Proximal Policy Optimization|PPO]] || Proximal Policy Optimization || Model-Free || On-policy || Continuous || Continuous || Advantage | | [[Proximal Policy Optimization|PPO]] || [[Proximal Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage | ||
|- | |- | ||
|TD3 | |TD3 |
Revision as of 21:21, 18 December 2022
A Model-Free Reinforcement Learning Algorithm is a reinforcement learning algorithm that is a model-free learning algorithm.
- Example(s):
- Counter-Example(s):
- See: Markov Decision Process, Trial And Error.
References
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Model-free_(reinforcement_learning) Retrieved:2020-12-10.
- In reinforcement learning (RL), a model-free algorithm (as opposed to a model-based one) is an algorithm which does not use the transition probability distribution (and the reward function) associated with the Markov decision process (MDP) , which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm . An example of a model-free algorithm is Q-learning.
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Model-free_(reinforcement_learning)#Key_model-free_reinforcement_learning_algorithms Retrieved:2020-12-10.
Algorithm | Description | Model | Policy | Action Space | State Space | Operator |
---|---|---|---|---|---|---|
DQN | Deep Q Network | Model-Free | Off-policy | Discrete | Continuous | Q-value |
DDPG | Deep Deterministic Policy Gradient | Model-Free | Off-policy | Continuous | Continuous | Q-value |
A3C | Asynchronous Advantage Actor-Critic Algorithm | Model-Free | On-policy | Continuous | Continuous | Advantage |
TRPO | Trust Region Policy Optimization | Model-Free | On-policy | Continuous | Continuous | Advantage |
PPO | Proximal Policy Optimization | Model-Free | On-policy | Continuous | Continuous | Advantage |
TD3 | Twin Delayed Deep Deterministic Policy Gradient | Model-Free | Off-policy | Continuous | Continuous | Q-value |
SAC | Soft Actor-Critic | Model-Free | Off-policy | Continuous | Continuous | Advantage |