Skip to Main content Skip to Navigation
Theses

Vers une meilleure utilisabilité des mémoires de traduction, fondée sur un alignement sous-phrastique

Abstract : Computer aided translation has known a boost in the years 1990s with the introduction of translation memory-based environments. These systems take advantage of the repetitiveness of technical materials that are produced and translated in the industry, by allowing translators to reuse archive translation thus improving their productivity. Translation memories use text segments (typically whole sentences) delineated and aligned thanks to the translators expertise, and do not perform any advanced analysis.

However, these memories contain very rich information at sub-sentential levels but translators cannot benefit from it. The TransTree formalism captures nested correspondences between sub-segments of bilingual or multilingual texts. These complex correspondences, called amphigrams, make up a tree structure that is easily expressed in XML. With a simple shallow transformation, a dynamical visualization can be obtained that demonstrates several levels of correspondences between sub-segments.

TransTree comes with a general, statistical method to compute this information, based on binary secability trees. This method analyses any bisegment and programmatically produces a TransTree representation from correspondences between typographical words in bisegments. Moreover, it is possible to abstract translation patterns, called generic amphigrams, by clustering techniques over examples found in the corpus.

A few experiments were conducted to validate the expressive power of the formalism, investigate several implementation options and introduce an algorithm to reassemble a target string from a previously unseen source segment with knowledge extracted from translation memories.
Document type :
Theses
Complete list of metadata

Cited literature [79 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00012126
Contributor : Christophe Chenon <>
Submitted on : Wednesday, April 12, 2006 - 4:52:50 PM
Last modification on : Friday, November 6, 2020 - 3:44:16 AM
Long-term archiving on: : Wednesday, September 8, 2010 - 4:23:42 PM

Identifiers

  • HAL Id : tel-00012126, version 1

Collections

UJF | IMAG | CNRS | UGA

Citation

Christophe Chenon. Vers une meilleure utilisabilité des mémoires de traduction, fondée sur un alignement sous-phrastique. Interface homme-machine [cs.HC]. Université Joseph-Fourier - Grenoble I, 2005. Français. ⟨tel-00012126⟩

Share

Metrics

Record views

536

Files downloads

31372