Real-Time Dynamic Programming (RTDP) Algorithm: Difference between revisions
Jump to navigation
Jump to search
m (Text replace - " Geoffrey I. Webb" to " Geoffrey I. Webb") |
m (Text replacement - "]] " to "]] ") |
||
(28 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
A [[Real-Time Dynamic Programming (RTDP) Algorithm]] is a [[Dynamic Programming Algorithm]] that is enhanced by implementing a [[heuristic search algorithm]]. | |||
* <B>Context:</B> | |||
** It is an [[Adaptive Real-Time Dynamic Programming (ARTDP) Algorithm]] without a [[system]] [[identification]]. | |||
* <B>Example(s):</B> | |||
** [[Bayesian Real-Time Dynamic Programming (BRTDP) Algorithm]] ([[#2009|Sanner et al., 2009]]), | |||
** [[Bounded Real-Time Dynamic Programming (BRTDP) Algorithm]] ([[#2005|McMahan et al., 2005]]), | |||
** [[Labeled Real-Time Dynamic Programming (LRTDP) Algorithm]] ([[#2003|Bonet & Geffner, 2003]]), | |||
** … | |||
* <B>Counter-Example(s):</B> | |||
** [https://github.com/jmacglashan/burlap_examples/ Brown-UMBC Reinforcement Learning and Planning (BURLAP) Algorithm]], | |||
** [[Forward-Backward Algorithm]]. | |||
* <B>See:</B> [[Anytime Algorithm]]; [[Approximate Dynamic Programming]]; [[Reinforcement Learning]]. | |||
---- | ---- | ||
---- | ---- | ||
=== | == References == | ||
* ([[Sammut & Webb, | |||
=== 2017 === | |||
* ([[Sammut & Webb, 2017]]) ⇒ (2017) [https://link.springer.com/referenceworkentry/10.1007%2F978-1-4899-7687-1_701 Real-Time Dynamic Programming]. In: [[Sammut, C.]], [[Webb, G.I.]] (eds) [https://link.springer.com/referencework/10.1007/978-1-4899-7687-1 Encyclopedia of Machine Learning and Data Mining]. Springer, Boston, MA. | |||
** QUOTE: [[Real-Time Dynamic Programming|Real-Time Dynamic Programming (RTDP)]] is the same as [[Adaptive Real-Time Dynamic Programming (ARTDP)]] without the [[system identification]] component(...)<P>(...)[[RTDP]] combines strengths of [[heuristic search]] and [[DP]]. Like [[heuristic search]] – and unlike conventional DP – it does not have to evaluate the entire [[state space]] in order to produce an [[optimal solution]]. Like [[DP]] – and unlike most [[heuristic search algorithm]]s – it is applicable to [[nondeterministic problem]]s. Additionally, [[RTDP]]'s [[performance]] as an [[anytime algorithm]] is better than conventional [[DP]] and [[heuristic search algorithm]]s. [[ARTDP]] extends these strengths to problems for which a good model is not initially available. | |||
=== 2009 === | |||
* (Sanner et al., 2009) ⇒ [[Scott Sanner]], [[Robby Goetschalckx]], [[Kurt Driessens]], and [[Guy Shani]] (2009). [https://www.ijcai.org/Proceedings/09/Papers/297.pdf "Bayesian Real-Time Dynamic Programming"]. In: In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09). | |||
=== 2005 === | |||
* (McMahan et al., 2005) ⇒ [[H. Brendan McMahan]], [[Maxim Likhachev]], and [[Geoffrey J. Gordon]] (2005). [http://www.cs.cmu.edu/~ggordon/mcmahan-likhachev-gordon.brtdp.pdf Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees]. In: In: Proceedings of the 22nd International Conference on Machine learning (pp. 569-576). | |||
=== 2003 === | |||
* (Bonet & Geffner, 2003) ⇒ [[Blai Bonet]], and [[Hector Geffner]] (2003). [https://ftp.cs.ucla.edu/pub/stat_ser/R319.pdf "Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming"]. In: Proceedings of Thirteenth Interational Conference on Automated Planning and Scheduling (ICAPS), Vol. 3, pp. 12-21. | |||
---- | ---- | ||
__NOTOC__ | __NOTOC__ | ||
[[Category:Concept]] | |||
[[Category:Machine Learning]] |
Latest revision as of 02:40, 4 November 2024
A Real-Time Dynamic Programming (RTDP) Algorithm is a Dynamic Programming Algorithm that is enhanced by implementing a heuristic search algorithm.
- Context:
- It is an Adaptive Real-Time Dynamic Programming (ARTDP) Algorithm without a system identification.
- Example(s):
- Counter-Example(s):
- See: Anytime Algorithm; Approximate Dynamic Programming; Reinforcement Learning.
References
2017
- (Sammut & Webb, 2017) ⇒ (2017) Real-Time Dynamic Programming. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA.
- QUOTE: Real-Time Dynamic Programming (RTDP) is the same as Adaptive Real-Time Dynamic Programming (ARTDP) without the system identification component(...)
(...)RTDP combines strengths of heuristic search and DP. Like heuristic search – and unlike conventional DP – it does not have to evaluate the entire state space in order to produce an optimal solution. Like DP – and unlike most heuristic search algorithms – it is applicable to nondeterministic problems. Additionally, RTDP's performance as an anytime algorithm is better than conventional DP and heuristic search algorithms. ARTDP extends these strengths to problems for which a good model is not initially available.
- QUOTE: Real-Time Dynamic Programming (RTDP) is the same as Adaptive Real-Time Dynamic Programming (ARTDP) without the system identification component(...)
2009
- (Sanner et al., 2009) ⇒ Scott Sanner, Robby Goetschalckx, Kurt Driessens, and Guy Shani (2009). "Bayesian Real-Time Dynamic Programming". In: In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09).
2005
- (McMahan et al., 2005) ⇒ H. Brendan McMahan, Maxim Likhachev, and Geoffrey J. Gordon (2005). Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees. In: In: Proceedings of the 22nd International Conference on Machine learning (pp. 569-576).
2003
- (Bonet & Geffner, 2003) ⇒ Blai Bonet, and Hector Geffner (2003). "Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming". In: Proceedings of Thirteenth Interational Conference on Automated Planning and Scheduling (ICAPS), Vol. 3, pp. 12-21.