Skip to Main content Skip to Navigation

Vers une meilleure utilisabilité des mémoires de traduction, fondée sur un alignement sous-phrastique

Abstract : Computer aided translation has known a boost in the years 1990s with the introduction of translation memory-based environments. These systems take advantage of the repetitiveness of technical materials that are produced and translated in the industry, by allowing translators to reuse archive translation thus improving their productivity. Translation memories use text segments (typically whole sentences) delineated and aligned thanks to the translators expertise, and do not perform any advanced analysis.

However, these memories contain very rich information at sub-sentential levels but translators cannot benefit from it. The TransTree formalism captures nested correspondences between sub-segments of bilingual or multilingual texts. These complex correspondences, called amphigrams, make up a tree structure that is easily expressed in XML. With a simple shallow transformation, a dynamical visualization can be obtained that demonstrates several levels of correspondences between sub-segments.

TransTree comes with a general, statistical method to compute this information, based on binary secability trees. This method analyses any bisegment and programmatically produces a TransTree representation from correspondences between typographical words in bisegments. Moreover, it is possible to abstract translation patterns, called generic amphigrams, by clustering techniques over examples found in the corpus.

A few experiments were conducted to validate the expressive power of the formalism, investigate several implementation options and introduce an algorithm to reassemble a target string from a previously unseen source segment with knowledge extracted from translation memories.
Document type :
Complete list of metadata

Cited literature [79 references]  Display  Hide  Download
Contributor : Christophe Chenon <>
Submitted on : Wednesday, April 12, 2006 - 4:52:50 PM
Last modification on : Friday, November 6, 2020 - 3:44:16 AM
Long-term archiving on: : Wednesday, September 8, 2010 - 4:23:42 PM


  • HAL Id : tel-00012126, version 1




Christophe Chenon. Vers une meilleure utilisabilité des mémoires de traduction, fondée sur un alignement sous-phrastique. Interface homme-machine [cs.HC]. Université Joseph-Fourier - Grenoble I, 2005. Français. ⟨tel-00012126⟩



Record views


Files downloads