Model-Free Reinforcement Learning Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
Line 32: Line 32:
| TRPO || [[Trust Region Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage
| TRPO || [[Trust Region Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage
|-
|-
| [[Proximal Policy Optimization|PPO]] || Proximal Policy Optimization || Model-Free || On-policy || Continuous || Continuous || Advantage
| [[Proximal Policy Optimization|PPO]] || [[Proximal Policy Optimization]] || Model-Free || On-policy || Continuous || Continuous || Advantage
|-
|-
|TD3
|TD3

Revision as of 21:21, 18 December 2022

A Model-Free Reinforcement Learning Algorithm is a reinforcement learning algorithm that is a model-free learning algorithm.



References

2020


2020

Algorithm Description Model Policy Action Space State Space Operator
DQN Deep Q Network Model-Free Off-policy Discrete Continuous Q-value
DDPG Deep Deterministic Policy Gradient Model-Free Off-policy Continuous Continuous Q-value
A3C Asynchronous Advantage Actor-Critic Algorithm Model-Free On-policy Continuous Continuous Advantage
TRPO Trust Region Policy Optimization Model-Free On-policy Continuous Continuous Advantage
PPO Proximal Policy Optimization Model-Free On-policy Continuous Continuous Advantage
TD3 Twin Delayed Deep Deterministic Policy Gradient Model-Free Off-policy Continuous Continuous Q-value
SAC Soft Actor-Critic Model-Free Off-policy Continuous Continuous Advantage