2023 SyntheticallyGeneratedTextforSu

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Synthetic Text Generation.

Notes

Cited By

Quotes

Abstract

Supervised text models are a valuable tool for political scientists but present several obstacles to their use, including the expense of hand-labeling documents, the difficulty of retrieving rare relevant documents for annotation, and copyright and privacy concerns involved in sharing annotated documents. This article proposes a partial solution to these three issues, in the form of controlled generation of synthetic text with large language models. I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, and a simple technique for improving the quality of synthetic text. I demonstrate the usefulness of synthetic text with three applications: generating synthetic tweets describing the fighting in Ukraine, synthetic news articles describing specified political events for training an event detection system, and a multilingual corpus of populist manifesto statements for training a sentence-level populism classifier.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2023 SyntheticallyGeneratedTextforSuAndrew HaltermanSynthetically Generated Text for Supervised Text Analysis10.48550/arXiv.2303.160282023