2024 TheSurprisingEffectivenessofTes

(Akyürek et al., 2024) ⇒ Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, and Jacob Andreas. (2024). “The Surprising Effectiveness of Test-Time Training for Abstract Reasoning.” doi:10.48550/arXiv.2411.07279

Subject Headings:

Notes

Cited By

http://scholar.google.com/scholar?q=%222024%22+The+Surprising+Effectiveness+of+Test-Time+Training+for+Abstract+Reasoning

Quotes

Abstract

Language models have shown impressive performance on tasks within their training distribution, but often struggle with novel problems requiring complex reasoning. We investigate the effectiveness of test-time training (TTT) -- updating model parameters temporarily during inference using a loss derived from input data -- as a mechanism for improving models' reasoning capabilities, using the Abstraction and Reasoning Corpus (ARC) as a benchmark. Through systematic experimentation, we identify three crucial components for successful TTT: (1) initial finetuning on similar tasks (2) auxiliary task format and augmentations (3) per-instance training. TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models; applying TTT to an 8B-parameter language model, we achieve 53% accuracy on the ARC's public validation set, improving the state-of-the-art by nearly 25% for public and purely neural approaches. By ensembling our method with recent program generation approaches, we get SoTA public validation accuracy of 61.9%, matching the average human score. Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models; additional test-time applied to continued training on few-shot examples can also be extremely effective.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2024 TheSurprisingEffectivenessofTes	Yoon Kim Mehul Damani Linlu Qiu Han Guo Jacob Andreas Ekin Akyürek			The Surprising Effectiveness of Test-Time Training for Abstract Reasoning				10.48550/arXiv.2411.07279		2024