Automated Legal Text Summarization Task

From GM-RKB
Jump to navigation Jump to search

An Automated Legal Text Summarization Task is a legal text summarization task that is a legal NLG task.


References

2023

  • (Sharma et al., 2023) ⇒ Saloni Sharma, Surabhi Srivastava, Pradeepika Verma, Anshul Verma, and Sachchida Nand Chaurasia. (2023). “A Comprehensive Analysis of Indian Legal Documents Summarization Techniques.” SN Computer Science 4, no. 5
    • ABSTRACT: In the Legal AI field, the summarization of legal documents is very challenging. Since the Indian case documents are much noisier and poorly organized, the summarization of legal documents can be useful for legal professionals, who often have to read and analyze large amounts of legal text. During the review process of the legal documents, a team of reviewers may be needed to understand and for taking further actions. A branch of text summarization called ‘legal text summarization’ which is concerned with summarizing legal texts, such as court opinions, contracts, and legal briefs may reduce the need of these reviewers. Legal text summarization aims to highlight the key points of a legal document and convey them in a concise form so that decisions can be made in quick manner. In this paper, we experimented on seven machine learning-based summarization models to analyse their performance on judgment report datasets that has been collected from Indian national legal portal. The models that are taken here for the analysis are BART, LexRank, TextRank, Luhn, LSA, Legal Pegasus, and Longformer. We experimented with these models to find which model may perform well on the legal data. As a result, we observed that Legal Pegasus outperforms over all other models in the case legal summarization.
    • https://github.com/Saloni-sharma29/Summarizing-Indian-Legal-Documents-A-Comparative-Study-of-Models-and-Techniques
    • QUOTE: To conduct our research on Indian legal document summarization, we used 30 publicly available legal documents from the Indian Kanoon website (www.indiankanoon.org). We have uploaded these documents to our GitHub repository for the purpose of our research. It is important to note that Indian Kanoon is a third-party website and we do not claim any ownership or responsibility for the content available on their website. We have provided proper attribution and citation to Indian Kanoon as the source of our legal documents in this repository. Users are advised to refer to the Indian Kanoon website for any legal or official purposes.

2023

2020

  • (Zhang, Zhao et al., 2020) ⇒ Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. (2020). “Pegasus: Pre-training with Extracted Gap-sentences for Abstractive Summarization.” In: International Conference on Machine Learning, pp. 11328-11339 . PMLR,
    • ABSTRACT: Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.