2024 EfficientExplorationforLLMs

Subject Headings: Epistemic Neural Network, Double Thompson Sampling, RLHF.

Notes

It demonstrates the substantial benefits of efficient exploration in gathering human feedback to improve large language models.
It compares passive exploration with several active exploration algorithms, highlighting the effectiveness of double Thompson sampling with an epistemic neural network.
It utilizes the Anthropic datasets and Gemini language models, alongside a human feedback simulator, for its experimentation pipeline.
It incorporates a reward model architecture that includes point estimates and epistemic neural networks to estimate uncertainty.
It shows that active exploration significantly reduces the number of queries required to achieve high levels of performance.
It validates the results with empirical data, demonstrating that efficient exploration can potentially accelerate achieving superhuman creativity by decades.
It suggests future work in exploring more complex ENN architectures, multiturn dialog exploration, and tuning more of the LLM torso.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2024 EfficientExplorationforLLMs	Vikranth Dwaracherla Seyed Mohammad Asghari Botao Hao Benjamin Van Roy			Efficient Exploration for LLMs				10.48550/arXiv.2402.00396		2024