2002 SummarizationBeyondSentenceExtraction

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Abstractional Multi-Document Summarization Algorithm.

Notes

  • This is an extended version of the paper (Knight & March, 2000) ⇒ Kevin Knight, and Daniel Marcu. (2000). “Statistics-based Summarization — Step One: Sentence Compression.” In: Proceedings of AAAI 2000 (AAAI 2000).

Cited By

Quotes

Author Keywords

Summarization; Compression; Noisy-channel model

Abstract

When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. Given that large collections of text/abstract pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. In this paper, we focus on sentence compression, a simpler version of this larger challenge. We aim to achieve two goals simultaneously: our compressions should be grammatical, and they should retain the most important pieces of information. These two goals can conflict. We devise both a and a decision-tree approach to the problem, and we evaluate results against manual compressions and a simple baseline.


References

  • 1. M. Banko, V. Mittal and M. Witbrock, Headline generation based on statistical translation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong (2000), pp. 318–325.
  • 2. Regina Barzilay, N. Elhadad and Kathleen R. McKeown, Sentence ordering in multidocument summarization. In: Proceedings of the First International Conference on Human Language Technology Research (HLT-01), San Diego, CA (2001), pp. 149–156.
  • 3. Regina Barzilay, Kathleen R. McKeown and M. Elhadad, Information fusion in the context of multi-document summarization. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), University of Maryland (1999), pp. 550–557.
  • 4. A. Berger and J. Lafferty, Information retrieval as statistical translation. In: Proceedings of the 22nd Conference on Research and Development in Information Retrieval (SIGIR-99), Berkeley, CA (1999), pp. 222–229.
  • 5. A. Berger and V. Mittal, Query-relevant summarization using FAQs. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong (2000), pp. 294–301.
  • 6. Peter F. Brown, S. Della Pietra, V. Della Pietra and R. Mercer, The mathematics of statistical machine translation: Parameter estimation. Comput. Linguistics 19 2 (1993), pp. 263–311. View Record in Scopus | Cited By in Scopus (566)
  • 7. O. Buyukkokten, H. Garcia-Molina and A. Paepcke, Seeing the whole in parts: Text summarization for web browsing on handheld devices. In: Proceedings of the 10th International WWW Conference, Hong Kong, China (2001).
  • 8. Y. Canning, J. Tait, J. Archibald and R. Crawley, Cohesive generation of syntactically simplified newspaper text. In: Workshop on Robust Methods in Analysis of Natural Language Data, Lausanne (2000), pp. 145–150.
  • 9. J. Carroll, G. Minnen, Y. Canning, S. Devlin and J. Tait, Practical simplification of English newspaper text to assist aphasic readers. In: Proceedings of the AAAI-98 Workshop on Integrating AI and Assistive Technology, Madison, WI (1998).
  • 10. R. Chandrasekar, C. Doran and B. Srinivas, Motivations and methods for text simplification. In: Proceedings of the International Conference on Computational Linguistics (COLING-96), Copenhagen (1996).
  • 11. K. Church, A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the Second Conference on Applied Natural Language Processing, Austin, TX (1988), pp. 136–143.
  • 12. M. Collins, Three generative lexicalized models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97), Madrid, Spain (1997), pp. 16–23.
  • 13. Gregory Grefenstette, Producing intelligent telegraphic text reduction to provide an audio scanning service for the blind. In: Working Notes of the AAAI Spring Symposium on Intelligent Text Summarization, Stanford University, CA (1998), pp. 111–118.
  • 14. F. Jelinek Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA (1997).
  • 15. H. Jing and Kathleen R. McKeown, The decomposition of human-written summary sentences. In: Proceedings of the 22nd Conference on Research and Development in Information Retrieval (SIGIR-99), Berkeley, CA (1999).
  • 16. R. Jing and A. Hauptmann, Title generation for machine-translated documents. In: Proceedings of IJCAI-01, Seattle, WA (2001), pp. 1229–1234.
  • 17. K. Knight and J. Graehl, Machine transliteration. Comput. Linguistics 24 4 (1998), pp. 599–612. View Record in Scopus | Cited By in Scopus (60)
  • 18. I. Langkilde, Forest-based statistical sentence generation. In: Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA (2000).
  • 19. N. Linke-Ellis, Closed captioning in America: Looking beyond compliance. In: Proceedings of the TAO Workshop on TV Closed Captions for the Hearing Impaired People, Tokyo, Japan (1999), pp. 43–59.
  • 20. D. Magerman, Statistical decision-tree models for parsing. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA (1995), pp. 276–283.
  • 21. K. Mahesh, Hypertext summary extraction for fast document browsing. In: Proceedings of AAAI Spring Symposium on Natural Language Processing for the World Wide Web, Stanford, CA (1997), pp. 95–104.
  • 22. I. Mani, B. Gates and E. Bloedorn, Improving summaries by revising them. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park, MD (1999), pp. 558–565.
  • 23. I. Mani and M. Maybury, Editors, Advances in Automatic Text Summarization, MIT Press, Cambridge, MA (1999).
  • 24. D. Marcu, L. Carlson and M. Watanabe, The automatic translation of discourse structures. In: Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics NAACL-2000, Seattle, WA (2000), pp. 9–17.
  • 25. D. Marcu and L. Gerber, An inquiry into the nature of multidocument abstract extracts and their evaluation. In: Proceedings of the NAACL-01 Workshop on Text Summarization, Pittsburgh, PA (2001).
  • 26. Kathleen R. McKeown, J. Klavans, V. Hatzivassiloglou, Regina Barzilay and E. Eskin, Towards multidocument summarization by reformulation: Progress and prospects. In: Proceedings of AAAI-99, Orlando, FL (1999), pp. 453–460. View Record in Scopus | Cited By in Scopus (12)
  • 27. J. Quinlan C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA (1993).
  • 28. J. Robert-Ribes, S. Pfeiffer, R. Ellison and D. Burnham, Semi-automatic captioning of TV programs, an Australian perspective. In: Proceedings of the TAO Workshop on TV Closed Captions for the Hearing Impaired People, Tokyo, Japan (1999), pp. 87–100.
  • 29. M. Witbrock and V. Mittal, Ultra-summarization: A statistical approach to generating highly condensed non-extractive summaries. In: Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR-99), Poster Session, Berkeley, CA (1999), pp. 315–316.
  • 30. J. Zelle and R. Mooney, Learning to parse database queries using inductive logic programming. In: Proceedings AAAI-96, Portland, OR (1996), pp. 1050–1055. View Record in Scopus | Cited By in Scopus (9)

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 SummarizationBeyondSentenceExtractionDaniel Marcu
Kevin Knight
Summarization Beyond Sentence Extraction: A probabilistic approach to sentence compressionhttp://www.isi.edu/~marcu/papers/aij02-compression.pdf