Aligned parallel text

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla (Gr. for "sixfold") placed six versions of the Old Testament side by side. Note also the most famous example, the Rosetta Stone.

Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level are prerequisite for many areas of linguistic research. During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task.


Main article: Bitext word alignment

In the field of translation studies a bitext is a merged document composed of both source- and target-language versions of a given text.

Bitexts are generated by a piece of software called an alignment tool, or a bitext tool, which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called a bitext database or a bilingual corpus, and can be consulted with a search tool.

Bitexts and translation memories

The concept of the bitext shows certain similarities with that of the translation memory. Generally, the most salient difference between a bitext and a translation memory is that a translation memory is a database in which its segments (matched sentences) are stored in a way that is totally unrelated to their original context; the original sentence order is lost. A bitext retains the original sentence order. However, some implementations of translation memory, such as Translation Memory eXchange (TMX) (a standard XML format for exchanging translation memories between computer-assisted translation (CAT) programs, allow preserving the original order of sentences.

Bitexts are designed to be consulted by a human translator, not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance.

In his original 1988 article, Harris also posited that bitext represents how translators hold their source and target texts together in their mental working memories as they progress. However, this hypothesis has not been followed up.

See also

External links

Parallel corpora

  • European Parliament Proceedings Parallel Corpus 1996-2011
  • The Opus project aims at collecting freely available parallel corpora
  • Japanese-English Bilingual Corpus of World Heritage Encyclopedia's Kyoto Articles
  • COMPARA - Portuguese/English parallel corpora
  • TERMSEARCH - English/Russian/French parallel corpora (Major international treaties, conventions, agreements, etc.
  • TradooIT - English/French/Spanish - Free Online tools
  • Nunavut Hansard - English/Inuktitut parallel corpus
  • ParaSol - A parallel corpus of Slavic and other languages
  • Glosbe: Multilanguage parallel corpora with online search interface
  • InterCorp: A multilingual parallel corpus 20+ languages aligned with Czech, online search interface
  • myCAT - Olanto, concordancer (open source AGPL) with online search on JCR and UNO corpus
  • TAUS, with online search interface.


  • Parallel text processing bibliography by J. Veronis and M.-D. Mahimon
  • Proceedings of the 2003 Workshop on Building and Using Parallel Texts
  • Proceedings of the 2005 Workshop on Building and Using Parallel Texts

Alignment tools

  • The Hunalign sentence aligner
  • GIZA++ alignment tool
  • An implementation of the Gale and Church sentence alignment algorithm
  • An alignment tool by Jörg Tiedemann


Harris, B. 'Bi-text, a new concept in translation theory', Language Monthly (UK) 54.8-10, March 1988.

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.