Parallel Corpus

From GM-RKB
Jump to navigation Jump to search

A Parallel Corpus is a text corpus that contains both source texts and their translations



References

2017a

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Parallel_text Retrieved:2017-5-28.
    • A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla (Greek for "sixfold") placed six versions of the Old Testament side by side. The most famous example is the Rosetta Stone.

      Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level are prerequisite for many areas of linguistic research.

      During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task.

2017b

2007

  • (McEnery & Xiao, 2007) ⇒ McEnery, A., & Xiao, R. (2007). Parallel and comparable corpora: What is happening. Incorporating Corpora. The Linguist and the Translator, 18-31. http://core.ac.uk/download/pdf/71933.pdf
    • (...) A parallel corpus can be defined as a corpus that contains source texts and their translations. Parallel corpora can be bilingual or multilingual. They can be uni-directional (e.g. from English into Chinese or from Chinese into English alone), bi-directional (e.g. containing both English source texts with their Chinese translations as well as Chinese source texts with their English translations), or multi-directional (e.g. the same piece of writing with English, French and German versions). In this sense, texts which are produced simultaneously in different languages (e.g. EU and UN regulations) also belong to the category of parallel corpora (cf. Hunston,2002: 15).

2005

  • (Koehn, 2005) ⇒ Koehn, P. (2005, September). Europarl: A parallel corpus for statistical machine translation. In MT summit (Vol. 5, pp. 79-86) [3].

1997