ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Performance Metric

From GM-RKB
(Redirected from ROUGE Score Metric)
Jump to navigation Jump to search

A ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Performance Metric is an intrinsic NLG performance measure against a Gold standard text.



References

2023

2023

2022

  • (Jain et al., 2022) ⇒ Raghav Jain, Vaibhav Mavi, Anubhav Jangra, and Sriparna Saha. (2022). “Widar-weighted Input Document Augmented Rouge.” In: European Conference on Information Retrieval, pp. 304-321 . Cham: Springer International Publishing,
    • ABSTRACT: The task of automatic text summarization has gained a lot of traction due to the recent advancements in machine learning techniques. However, evaluating the quality of a generated summary remains to be an open problem. The literature has widely adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as the standard evaluation metric for summarization. However, ROUGE has some long-established limitations; a major one being its dependence on the availability of good quality reference summary. In this work, we propose the metric WIDAR which in addition to utilizing the reference summary uses also the input document in order to evaluate the quality of the generated summary. The proposed metric is versatile, since it is designed to adapt the evaluation score according to the quality of the reference summary. The proposed metric correlates better than ROUGE by 26%, 76%, 82%, and 15%, respectively, in coherence, consistency, fluency, and relevance on human judgement scores provided in the SummEval dataset. The proposed metric is able to obtain comparable results with other state-of-the-art metrics while requiring a relatively short computational time (Implementation for WIDAR can be found at - https://github.com/Raghav10j/WIDAR).

2017

2017

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/ROUGE_(metric)#Metrics Retrieved:2017-5-30.
    • The following five evaluation metrics [1] are available.
    • ROUGE-N: N-gram [2] based co-occurrence statistics.
    • ROUGE-L: Longest Common Subsequence (LCS) [3] based statistics. Longest common subsequence problem takes into account sentence level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically.
    • ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes .
    • ROUGE-S: Skip-bigram [4] based co-occurrence statistics. Skip-bigram is any pair of words in their sentence order.
      • ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.
    • ROUGE can be downloaded from berouge download link.

2004a

2004b