Real-Time Dynamic Programming (RTDP) Algorithm: Difference between revisions

Revision as of 05:21, 12 June 2024

Context:
- It is an Adaptive Real-Time Dynamic Programming (ARTDP) Algorithm without a system identification.
Example(s):
Counter-Example(s):
- Brown-UMBC Reinforcement Learning and Planning (BURLAP) Algorithm],
- Forward-Backward Algorithm.
See: Anytime Algorithm; Approximate Dynamic Programming; Reinforcement Learning.

(Sammut & Webb, 2017) ⇒ (2017) Real-Time Dynamic Programming. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA.
- QUOTE: Real-Time Dynamic Programming (RTDP) is the same as Adaptive Real-Time Dynamic Programming (ARTDP) without the system identification component(...)
  (...)RTDP combines strengths of heuristic search and DP. Like heuristic search – and unlike conventional DP – it does not have to evaluate the entire state space in order to produce an optimal solution. Like DP – and unlike most heuristic search algorithms – it is applicable to nondeterministic problems. Additionally, RTDP's performance as an anytime algorithm is better than conventional DP and heuristic search algorithms. ARTDP extends these strengths to problems for which a good model is not initially available.

(Sanner et al., 2009) ⇒ Scott Sanner, Robby Goetschalckx, Kurt Driessens, and Guy Shani (2009). "Bayesian Real-Time Dynamic Programming". In: In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09).

(Bonet & Geffner, 2003) ⇒ Blai Bonet, and Hector Geffner (2003). "Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming". In: Proceedings of Thirteenth Interational Conference on Automated Planning and Scheduling (ICAPS), Vol. 3, pp. 12-21.

@@ Line 22: / Line 22: @@
 === 2009 ===
-* (Sanner et al., 2009) ⇒ [[Scott Sanner]], [[Robby Goetschalckx]], [[Kurt Driessens]], and [[Guy Shani]] (2009). [https://www.ijcai.org/Proceedings/09/Papers/297.pdf "Bayesian Real-Time Dynamic Programming"]. In: In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09).
+* (Sanner et al., 2009) ⇒ [[Scott Sanner]], [[Robby Goetschalckx]], [[Kurt Driessens]], and [[Guy Shani]] (2009). [https://www.ijcai.org/Proceedings/09/Papers/297.pdf "Bayesian Real-Time Dynamic Programming"]. In: In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09).
 === 2005 ===
-* (McMahan et al., 2005) ⇒ [[H. Brendan McMahan]], [[Maxim Likhachev]], and [[Geoffrey J. Gordon]] (2005). [http://www.cs.cmu.edu/~ggordon/mcmahan-likhachev-gordon.brtdp.pdf Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees]. In: In Proceedings of the 22nd International Conference on Machine learning (pp. 569-576).
+* (McMahan et al., 2005) ⇒ [[H. Brendan McMahan]], [[Maxim Likhachev]], and [[Geoffrey J. Gordon]] (2005). [http://www.cs.cmu.edu/~ggordon/mcmahan-likhachev-gordon.brtdp.pdf Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees]. In: In: Proceedings of the 22nd International Conference on Machine learning (pp. 569-576).
 === 2003 ===