Automated Text Generation (NLG) Task
An Automated Text Generation (NLG) Task is a text generation task that is an automated natural language processing task producing natural language expressions, primarily text items.
- Context:
- output: NLG Task Output (Machine Written text).
- measure: NLG Performance Measure, such as Syntactic Correctness and Intelligibility.
- It can range from being a Heuristic Language Generation Task to being a Data-Driven Language Generation Task.
- It can range from being a Domain-Specific NLG Task to being an Open-Domain NLG Task.
- It can range from being Word Generation Task, Phrase Generation Task, Sentence Generation Task, Passage Generation Task, Document Generation Task, ...
- It can range from being a Freeform NLG Task (such as chit chat) to being a Topic-based NLG Task.
- It can range from being a Shallow NLG Task to being a Deep NLG Task, depending on the level of linguistic and semantic understanding required.
- It can range from being a Short-form NLG Task to being a Long-form NLG Task, based on the length and complexity of the generated text.
- It can be solved by an Automated Text Generation System (that implements a text generation algorithm).
- It can be supported by a Natural Language Understanding Task.
- ...
- Example(s):
- NLG Task Types by Linguistic Depth:
- NLG Task Types by Text Length:
- NLG Task Types by Domain Specificity:
- Domain-Specific NLG Tasks, such as: Medical NLG and Legal NLG.
- Open-Domain NLG Tasks, such as: Automated Question Answering or Automated Summarization.
- NLG Task Types by Linguistic Unit:
- Word Generation Tasks, such as: Automated Synonym Gneration.
- Phrase Generation Tasks, such as: Automated Paraphrasing.
- Sentence Generation Tasks, such as: Automated Definitional Sentence Generation and Automated Image Description Generation.
- Passage Generation Tasks, such as: Automated Summarization, such as contract summarization.
- Document Generation Tasks, such as Automated Wikipedia Page Creation or Automated Contract Drafting.
- NLG Task Types by Topical Constraint:
- Freeform NLG Tasks, such as: CJS Neural Narrative Text Generation Task.
- Automated Domain-Specific NLG, such as Medical NLG, Legal NLG, Software NLG.
- Constrainted NLG Task, such as: Generate Text(length={200}, subject='history', vocabulary='advanced', tone='formal', structure='intro, body, conclusion', deadline='2023-12-31', sentiments='neutral', audience='adults') => “Introduction about the subject of history. Detailed body text employing an advanced vocabulary and a formal tone. Conclusive remarks. Completed before the specified deadline, aimed at an adult audience with a neutral sentiment."
- Writing Assistance Tasks, such as: ...
- Data-to-Text Generation Task, such as: Data-to-Text Generation via Template-based Design.
- Machine Translation Task.
- ...
- Counter-Example(s):
- See: Content Planning, Document Structuring, Lexical Choice, Narrative Generation, Pragmatic Analysis, Semantic Analysis, Surface Realization, Text Planning, Text Structuring.
References
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Natural-language_generation Retrieved:2021-2-20.
- Natural-language generation (NLG) is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations (a chatbot) which might even be read out by a text-to-speech system.
Automated NLG can be compared to the process humans use when they turn ideas into writing or speech. Psycholinguists prefer the term language production for this process, which can also be described in mathematical terms, or modeled in a computer for psychological research. NLG systems can also be compared to translators of artificial computer languages, such as decompilers or transpilers, which also produce human-readable code generated from an intermediate representation. Human languages tend to be considerably more complex and allow for much more ambiguity and variety of expression than programming languages, which makes NLG more challenging.
NLG may be viewed as the opposite of natural-language understanding (NLU): whereas in natural-language understanding, the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words. The practical considerations in building NLU vs. NLG systems are not symmetrical. NLU needs to deal with ambiguous or erroneous user input, whereas the ideas the system wants to express through NLG are generally known precisely. NLG needs to choose a specific, self-consistent textual representation from many potential representations, whereas NLU generally tries to produce a single, normalized representation of the idea expressed.[1]
NLG has existed since ELIZA was developed in the mid 1960s, but commercial NLG technology has only recentlybecome widely available. NLG techniques range from simple template-based systems like a mail merge that generates form letters, to systems that have a complex understanding of human grammar. NLG can also be accomplished by training a statistical model using machine learning, typically on a large corpus of human-written texts.
- Natural-language generation (NLG) is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application. It can also be used to generate short blurbs of text in interactive conversations (a chatbot) which might even be read out by a text-to-speech system.
- ↑ Dale, Robert; Reiter, Ehud (2000). Building natural language generation systems. Cambridge, U.K.: Cambridge University Press. ISBN 978-0-521-02451-8.
2018
- (Lee et al., 2018) ⇒ Chris van der Lee, Emiel Krahmer, and Sander Wubben. (2018). “Automated Learning of Templates for Data-to-text Generation: Comparing Rule-based, Statistical and Neural Methods.” In: Proceedings of the 11th International Conference on Natural Language Generation (INLG 2018). DOI:http://dx.doi.org/10.18653/v1/W18-6504
- (Song et al., 2018) ⇒ Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. (2018). “A Graph-to-Sequence Model for AMR-to-Text Generation.” In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018) Volume 1: Long Papers. DOI:10.18653/v1/P18-1150
- (Guo et al., 2018) ⇒ Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. (2018). “Long Text Generation via Adversarial Training with Leaked Information.” In: Proceedings of the Thirty-Second (AAAI) Conference on Artificial Intelligence (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th (AAAI) Symposium on Educational Advances in Artificial Intelligence (EAAI-18).
- (Fedus et al., 2018) ⇒ William Fedus, Ian Goodfellow, and Andrew M Dai. (2018). “MaskGAN: Better Text Generation via Filling in the ________". In: Proceedings of the Sixth International Conference on Learning Representations (ICLR-2018).
- (Clark et al., 2018) ⇒ Elizabeth Clark, Yangfeng Ji, and Noah A. Smith. (2018). “Neural Text Generation in Stories Using Entity Representations As Context.” In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Volume 1 (Long Papers). DOI:10.18653/v1/N18-1204.
- (Kudo & Richardson, 2018) ⇒ Taku Kudo, and John Richardson. (2018). “SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing.” In: arXiv preprint arXiv:1808.06226.
- (Zhu et al., 2018) ⇒ Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. (2018). “Texygen: A Benchmarking Platform for Text Generation Models.” In: Proceedings of The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018). DOI:10.1145/3209978.3210080.
2017
- (Semeniuta et al., 2017) ⇒ Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. (2017). “A Hybrid Convolutional Variational Autoencoder for Text Generation.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). DOI:10.18653/v1/D17-1066.
- (Zhang et al., 2017) ⇒ Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin. (2017). “Adversarial Feature Matching for Text Generation". In: Proceedings of the 34th International Conference on Machine Learning (ICML 2017).
- (Li et al., 2017) ⇒ Jiwei Li, Will Monroe, Tianlin Shi, Sebastien Jean, Alan Ritter, and Dan Jurafsky. (2017). “Adversarial Learning for Neural Dialogue Generation.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). DOI:10.18653/v1/D17-1230.
- (Lin, Li, et al., 2017) ⇒ Kevin Lin, Dianqi Li, Xiaodong He, Ming-ting Sun, and Zhengyou Zhang. (2017). “Adversarial Ranking for Language Generation.” In: Proceedings of Advances in Neural Information Processing Systems 30 (NIPS-2017).
- (Che et al., 2017) ⇒ Tong Che, Yanran Li, Ruixiang Zhang, R. Devon Hjelm, Wenjie Li, Yangqiu Song, and Yoshua Bengio. (2017). “Maximum-Likelihood Augmented Discrete Generative Adversarial Networks.” In: ArXiv Preprint: 1702.07983.
- (Yu et al., 2017a) ⇒ Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. (2017). “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.” In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017).
2017h
- https://github.com/pytorch/examples/tree/master/word_language_model
- QUOTE: This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the WikiText-2 dataset, provided. The trained model can then be used by the generate script to generate new text.
2016
- (Kusner & Hernndez-Lobato, 2016) ⇒ Matt J. Kusner, and Jose Miguel Hernndez-Lobato. (2016). “GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution". In: arXiv:1611.04051.
2015a
- (Bahdanau et al., 2015) ⇒ Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. (2015). “Neural Machine Translation by Jointly Learning to Align and Translate.” In: Proceedings of the Third International Conference on Learning Representations, (ICLR-2015).