2024 TheSurprisingEffectivenessofTes
- (Akyürek et al., 2024) ⇒ Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, and Jacob Andreas. (2024). “The Surprising Effectiveness of Test-Time Training for Abstract Reasoning.” doi:10.48550/arXiv.2411.07279
Subject Headings:
Notes
Cited By
Quotes
Abstract
Language models have shown impressive performance on tasks within their training distribution, but often struggle with novel problems requiring complex reasoning. We investigate the effectiveness of test-time training (TTT) -- updating model parameters temporarily during inference using a loss derived from input data -- as a mechanism for improving models' reasoning capabilities, using the Abstraction and Reasoning Corpus (ARC) as a benchmark. Through systematic experimentation, we identify three crucial components for successful TTT: (1) initial finetuning on similar tasks (2) auxiliary task format and augmentations (3) per-instance training. TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models; applying TTT to an 8B-parameter language model, we achieve 53% accuracy on the ARC's public validation set, improving the state-of-the-art by nearly 25% for public and purely neural approaches. By ensembling our method with recent program generation approaches, we get SoTA public validation accuracy of 61.9%, matching the average human score. Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models; additional test-time applied to continued training on few-shot examples can also be extremely effective.
References
;
| Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2024 TheSurprisingEffectivenessofTes | Yoon Kim Mehul Damani Linlu Qiu Han Guo Jacob Andreas Ekin Akyürek | The Surprising Effectiveness of Test-Time Training for Abstract Reasoning | 10.48550/arXiv.2411.07279 | 2024 |