2022 TowardUnifyingTextSegmentationa

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Long Document Summarization, Text Segmentation, Longformer, Topical Coherence.

Notes

  • It addresses the challenge of summarizing long documents, such as scientific papers and spoken transcripts, which are complex due to their length and detailed structure.
  • It proposes a unified model named "Lodoss," integrating text segmentation and extractive summarization to enhance sentence representation and diversify sentence selection.
  • It leverages text segmentation to improve the summarization process, demonstrating that understanding document structure (like section boundaries) is crucial for identifying salient content.
  • It employs Longformer for token encoding and inter-sentence Transformers, with a Determinantal Point Process (DPP) regularizer for ensuring diversity and relevance in the summaries.
  • It evaluates the model's performance using datasets from scientific articles and lecture transcripts, comparing against strong baselines and measuring success with standard metrics like ROUGE scores.
  • It includes human assessment for evaluating the quality of summaries, focusing on informativeness and diversity, and receiving positive evaluations.
  • It highlights the model's limitations, such as dependence on accurate section boundaries and potential data biases from pretraining, suggesting avenues for future research.

Cited By

Quotes

Abstract

Text segmentation is important for signaling a document's structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem is only exacerbated by a lack of segmentation in transcripts of audio / video recordings. In this paper, we explore the role that section segmentation plays in extractive summarization of written and spoken documents. Our approach learns robust sentence representations by performing summarization and segmentation simultaneously, which is further enhanced by an optimization-based regularizer to promote selection of diverse summary sentences. We conduct experiments on multiple datasets ranging from scientific articles to spoken transcripts to evaluate the model's performance. Our findings suggest that the model can not only achieve state-of-the-art performance on publicly available benchmarks, but demonstrate better cross-genre transferability when equipped with text segmentation. We perform a series of analyses to quantify the impact of section segmentation on summarizing written and spoken documents of substantial length and complexity.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2022 TowardUnifyingTextSegmentationaDong Yu
Sangwoo Cho
Kaiqiang Song
Xiaoyang Wang
Fei Liu
Toward Unifying Text Segmentation and Long Document Summarization10.48550/arXiv.2210.164222022