Contextual Multi-Armed Bandit Task

From GM-RKB
Jump to navigation Jump to search

A Contextual Multi-Armed Bandit Task is a k-armed bandit task that includes a context vector ... where the rewarding systems are not independent.



References

2020


2016a

  • Bay Area Deep Learning School Day 2 https://youtu.be/9dXiAecyJrY?t=16m2s
    • QUOTE: Contextual Bandits provide the agent a little less information than in supervised learning]
      • Environment samples input [math]\displaystyle{ x_t \sim \rho }[/math]
      • Agent takes action [math]\displaystyle{ \hat{y}_t = f(m_t) }[/math]
      • Agent receives cost [math]\displaystyle{ C_a \sim P(c_t | x_t, \hat(y)_t) }[/math] where P is an unknown probability.
      • Environment asks agent a question, and giver her a noisy score on her answer.

2016b

2010