Fully Observable Markov Decision Process

From GM-RKB
Jump to navigation Jump to search

A Fully Observable Markov Decision Process is a Markov decision process that is also a fully observable process.



References

2012

  • (Mousam & Kolobov, 2012) ⇒ Mousam, and Andrey Kolobov. (2012). “Planning with Markov Decision Processes: An AI Perspective.” In: Morgan & Claypool Publishers. ISBN: 1608458865, 9781608458868.
    • QUOTE: A finite discrete-time fully observable MDP is a tuple [math]\displaystyle{ (S,A,D,T,R) }[/math], where:
      • S is the finite set of all possible states of the system, also called the state space;
      • A is the finite set of all actions an agent can take;
      • D is a finite or infinite sequence of the natural numbers of the form (1, 2, 3, . . . , Tmax) or (1, 2, 3, . . .) respectively, denoting the decision epochs, also called time steps, at which actions need to be taken;
      • T : S × A × S × D→ [0, 1] is a transition function, a mapping specifying the probability T (s1, a, s2, t) of going to state s2 if action a is executed when the agent is in state s1 at time step t ;
      • R : S × A × S × D → R is a reward function that gives a finite numeric reward value R(s1, a, s2, t) obtained when the system goes from state s1 to state s2 as a result of executing action a at time step t .