(Redirected from exploration)
- AKA: Exploration Phase, PAC-MDP Learning, Efficient Exploration in Reinforcement Learning.
- It is emphasized during the begining of a new task in In Machine Learning Algorithms.
- See: Exploitation Task, Learning Phase.
- (Langford, 2017) ⇒ Langford J. (2017) "Efficient Exploration in Reinforcement Learning". In: Sammut, C., Webb, G.I. (eds) "Encyclopedia of Machine Learning and Data Mining". Springer, Boston, MA
- QUOTE: An agent acting in a world makes observations, takes actions, and receives rewards for the actions taken. Given a history of such interactions, the agent must make the next choice of action so as to maximize the long-term sum of rewards. To do this well, an agent may take suboptimal actions which allow it to gather the information necessary to later take optimal or near-optimal actions with respect to maximizing the long-term sum of rewards. These information gathering actions are generally considered exploration actions.
- (Lipton et al., 2017) ⇒ Lipton, Z., Li, X., Gao, J., Li, L., Ahmed, F., & Deng, L. (2017). BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems. arXiv preprint arXiv:1711.05715.
- ABSTRACT: We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as \epsilon-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.
- (Korein et al.,2014) ⇒ Korein, M., Coltin, B., & Veloso, M. (2014, May). "Constrained scheduling of robot exploration tasks". In: Proceedings of the 2014 International Conference on Autonomous agents and multi-agent systems (pp. 429-436). International Foundation for Autonomous Agents and Multiagent Systems.
- ABSTRACT: In order for an autonomous service robot to provide the best service possible for its users, it must have a considerable amount of knowledge of the environment in which it operates. Service robots perform requests for users and can learn information about their environment while performing these requests. We note that such requests do not take all of the robot's time, and propose that a robot could schedule additional exploration tasks in its spare time to gather data about its environment. The data gathered can then be used to model the environment, and the model can be used to improve the services provided. Such additional exploration is a constrained form of active learning, in which the robot evaluates the knowledge it can acquire and chooses observations to gather, while being constrained by its navigation and the time underlying the user requests it receives. We create a schedule of exploration tasks that meets the given constraints using a hill-climbing approach on a graph of tasks the robot can perform to gather observations. We test this method in simulation of a CoBot robot and find that is able to learn an accurate model of its environment over time, leading to a near-optimal policy when scheduling user requests.