Absolute Zero Reasoner (AZR)

From GM-RKB
Jump to navigation Jump to search

A Absolute Zero Reasoner (AZR) is a self-play reinforcement learning paradigm that enables language models to simultaneously generate reasoning tasks and learn to solve them without requiring any human-curated data.