2022 TowardUnifyingTextSegmentationa

Subject Headings: Long Document Summarization, Text Segmentation, Longformer, Topical Coherence.

Notes

It addresses the challenge of summarizing long documents, such as scientific papers and spoken transcripts, which are complex due to their length and detailed structure.
It proposes a unified model named "Lodoss," integrating text segmentation and extractive summarization to enhance sentence representation and diversify sentence selection.
It leverages text segmentation to improve the summarization process, demonstrating that understanding document structure (like section boundaries) is crucial for identifying salient content.
It employs Longformer for token encoding and inter-sentence Transformers, with a Determinantal Point Process (DPP) regularizer for ensuring diversity and relevance in the summaries.
It evaluates the model's performance using datasets from scientific articles and lecture transcripts, comparing against strong baselines and measuring success with standard metrics like ROUGE scores.
It includes human assessment for evaluating the quality of summaries, focusing on informativeness and diversity, and receiving positive evaluations.
It highlights the model's limitations, such as dependence on accurate section boundaries and potential data biases from pretraining, suggesting avenues for future research.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2022 TowardUnifyingTextSegmentationa	Dong Yu Sangwoo Cho Kaiqiang Song Xiaoyang Wang Fei Liu			Toward Unifying Text Segmentation and Long Document Summarization				10.48550/arXiv.2210.16422		2022