Group Relative Policy Optimization (GRPO) Algorithm

From GM-RKB
Jump to navigation Jump to search

A Group Relative Policy Optimization (GRPO) Algorithm is a reinforcement learning algorithm that optimizes token-level agent policies through group-based relative reward comparisons for efficient training.