Skip to Main content Skip to Navigation
Theses

Going beyond the sentence : Contextual Machine Translation of Dialogue

Abstract : While huge progress has been made in machine translation (MT) in recent years, the majority of MT systems still rely on the assumption that sentences can be translated in isolation. The result is that these MT models only have access to context within the current sentence; context from other sentences in the same text and information relevant to the scenario in which they are produced remain out of reach. The aim of contextual MT is to overcome this limitation by providing ways of integrating extra-sentential context into the translation process. Context, concerning the other sentences in the text (linguistic context) and the scenario in which the text is produced (extra-linguistic context), is important for a variety of cases, such as discourse-level and other referential phenomena. Successfully taking context into account in translation is challenging. Evaluating such strategies on their capacity to exploit context is also a challenge, standard evaluation metrics being inadequate and even misleading when it comes to assessing such improvement in contextual MT. In this thesis, we propose a range of strategies to integrate both extra-linguistic and linguistic context into the translation process. We accompany our experiments with specifically designed evaluation methods, including new test sets and corpora. Our contextual strategies include pre-processing strategies designed to disambiguate the data on which MT models are trained, post-processing strategies to integrate context by post-editing MT outputs and strategies in which context is exploited during translation proper. We cover a range of different context-dependent phenomena, including anaphoric pronoun translation, lexical disambiguation, lexical cohesion and adaptation to properties of the scenario such as speaker gender and age. Our experiments for both phrase-based statistical MT and neural MT are applied in particular to the translation of English to French and focus specifically on the translation of informal written dialogues.
Complete list of metadatas

Cited literature [298 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02004683
Contributor : Abes Star :  Contact
Submitted on : Saturday, February 2, 2019 - 1:03:42 AM
Last modification on : Wednesday, September 16, 2020 - 5:31:03 PM
Long-term archiving on: : Friday, May 3, 2019 - 5:12:50 PM

File

70750_BAWDEN_2018_archivage.pd...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02004683, version 1

Citation

Rachel Bawden. Going beyond the sentence : Contextual Machine Translation of Dialogue. Computation and Language [cs.CL]. Université Paris-Saclay, 2018. English. ⟨NNT : 2018SACLS524⟩. ⟨tel-02004683⟩

Share

Metrics

Record views

735

Files downloads

2770