Skip to Main content Skip to Navigation

Modèle de traduction statistique à fragments enrichi par la syntaxe

Abstract : Traditional Statistical Machine Translation models are not aware of linguistic structure. Thus, target lexical choices and word order are controlled only by surface-based statistics learned from the training corpus. However, knowledge of linguistic structure can be beneficial since it provides generic information compensating data sparsity. The purpose of our work is to study the impact of syntactic information while preserving the general framework of Phrase-Based SMT. First, we study the integration of syntactic information using a reranking approach. We define features measuring the similarity between the dependency structures of source and target sentences, as well as features of linguistic coherence of the target sentences. The importance of each feature is assessed by learning their weights through a Structured Perceptron Algorithm. The evaluation of several reranking models shows that these features often improve the quality of translations produced by the basic model, in terms of manual evaluations as opposed to automatic measures. Then, we propose different models in order to increase the quality and diversity of the search graph produced by the decoder, through filtering out uninteresting hypotheses based on the source syntactic structure. This is done either by learning limits on the phrase recordering, or by decomposing the source sentence in order to simplify the translation process. The initial evaluations of these models look promising.
Document type :
Complete list of metadata

Cited literature [87 references]  Display  Hide  Download
Contributor : Christian Boitet Connect in order to contact the contributor
Submitted on : Tuesday, May 27, 2014 - 3:51:34 PM
Last modification on : Wednesday, July 6, 2022 - 4:17:25 AM
Long-term archiving on: : Wednesday, August 27, 2014 - 10:42:44 AM


  • HAL Id : tel-00996317, version 1


Vassilina Nikoulina. Modèle de traduction statistique à fragments enrichi par la syntaxe. Traitement du texte et du document. Université de Grenoble, 2010. Français. ⟨tel-00996317⟩



Record views


Files downloads