Dialog State Tracking Challenge

References

(Henderson et al., 2014) ⇒ Matthew Henderson, Blaise Thomson, and Jason D. Williams. (2014). “The Third Dialog State Tracking Challenge.” In: Spoken Language Technology Workshop (SLT), 2014 IEEE, pp. 324-329 . IEEE, doi:10.1109/SLT.2014.7078595
- ABSTRACT: In spoken dialog systems, dialog state tracking refers to the task of correctly inferring the user's goal at a given turn, given all of the dialog history up to that turn. This task is challenging because of speech recognition and language understanding errors, yet good dialog state tracking is crucial to the performance of spoken dialog systems. This paper presents results from the third Dialog State Tracking Challenge, a research community challenge task based on a corpus of annotated logs of human-computer dialogs, with a blind test set evaluation. The main new feature of this challenge is that it studied the ability of trackers to generalize to new entities - i.e. new slots and values not present in the training data. This challenge received 28 entries from 7 research teams. About half the teams substantially exceeded the performance of a competitive rule-based baseline, illustrating not only the merits of statistical methods for dialog state tracking but also the difficulty of the problem.

(Henderson et al., 2014) ⇒ Matthew Henderson, Blaise Thomson, and Jason Williams. (2014). “The Second Dialog State Tracking Challenge.” In: 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, vol. 263.
- ABSTRACT: A spoken dialog system, while communicating with a user, must keep track of what the user wants from the system at each step. This process, termed dialog state tracking, is essential for a successful dialog system as it directly informs the system’s actions. The first Dialog State Tracking Challenge allowed for evaluation of different dialog state tracking techniques, providing common testbeds and evaluation suites. This paper presents a second challenge, which continues this tradition and introduces some additional features – a new domain, changing user goals and a richer dialog state. The challenge received 31 entries from 9 research groups.
  The results suggest that while large improvements on a competitive baseline are possible, trackers are still prone to degradation in mismatched conditions. An investigation into ensemble learning demonstrates the most accurate tracking can be achieved by combining multiple trackers.