Model-Free Reinforcement Learning Algorithm: Difference between revisions

Revision as of 21:21, 18 December 2022

(Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Model-free_(reinforcement_learning) Retrieved:2020-12-10.
- In reinforcement learning (RL), a model-free algorithm (as opposed to a model-based one) is an algorithm which does not use the transition probability distribution (and the reward function) associated with the Markov decision process (MDP) , which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm . An example of a model-free algorithm is Q-learning.

Algorithm	Description	Model	Policy	Action Space	State Space	Operator
DQN	Deep Q Network	Model-Free	Off-policy	Discrete	Continuous	Q-value
DDPG	Deep Deterministic Policy Gradient	Model-Free	Off-policy	Continuous	Continuous	Q-value
A3C	Asynchronous Advantage Actor-Critic Algorithm	Model-Free	On-policy	Continuous	Continuous	Advantage
TRPO	Trust Region Policy Optimization	Model-Free	On-policy	Continuous	Continuous	Advantage
PPO	Proximal Policy Optimization	Model-Free	On-policy	Continuous	Continuous	Advantage
TD3	Twin Delayed Deep Deterministic Policy Gradient	Model-Free	Off-policy	Continuous	Continuous	Q-value
SAC	Soft Actor-Critic	Model-Free	Off-policy	Continuous	Continuous	Advantage

@@ Line 32: / Line 32: @@
 | TRPO || [[Trust Region Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage
 |-
-| [[Proximal Policy Optimization|PPO]] || Proximal Policy Optimization || Model-Free || On-policy || Continuous || Continuous || Advantage
+| [[Proximal Policy Optimization|PPO]] || [[Proximal Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage
 |-
 |TD3