2024 EfficientExplorationforLLMs

From GM-RKB
(Redirected from Dwaracherla et al., 2024)
Jump to navigation Jump to search

Subject Headings: Epistemic Neural Network, Double Thompson Sampling, RLHF.

Notes

Cited By

Quotes

Abstract

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2024 EfficientExplorationforLLMsVikranth Dwaracherla
Seyed Mohammad Asghari
Botao Hao
Benjamin Van Roy
Efficient Exploration for LLMs10.48550/arXiv.2402.003962024