Multimodal Machine Translation

Abstract : Machine translation aims at automatically translating documents from one language to another without human intervention. With the advent of deep neural networks (DNN), neural approaches to machine translation started to dominate the field, reaching state-ofthe-art performance in many languages. Neural machine translation (NMT) also revived the interest in interlingual machine translation due to how it naturally fits the task into an encoder-decoder framework which produces a translation by decoding a latent source representation. Combined with the architectural flexibility of DNNs, this framework paved the way for further research in multimodality with the objective of augmenting the latent representations with other modalities such as vision or speech, for example. This thesis focuses on a multimodal machine translation (MMT) framework that integrates a secondary visual modality to achieve better and visually grounded language understanding. I specifically worked with a dataset containing images and their translated descriptions, where visual context can be useful forword sense disambiguation, missing word imputation, or gender marking when translating from a language with gender-neutral nouns to one with grammatical gender system as is the case with English to French. I propose two main approaches to integrate the visual modality: (i) a multimodal attention mechanism that learns to take into account both sentence and convolutional visual representations, (ii) a method that uses global visual feature vectors to prime the sentence encoders and the decoders. Through automatic and human evaluation conducted on multiple language pairs, the proposed approaches were demonstrated to be beneficial. Finally, I further show that by systematically removing certain linguistic information from the input sentences, the true strength of both methods emerges as they successfully impute missing nouns, colors and can even translate when parts of the source sentences are completely removed.
Document type :
Theses
Complete list of metadatas

Cited literature [198 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02309868
Contributor : Abes Star <>
Submitted on : Wednesday, October 9, 2019 - 4:15:11 PM
Last modification on : Thursday, October 31, 2019 - 1:23:50 AM

File

2019LEMA1016.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02309868, version 1

Collections

Citation

Ozan Caglayan. Multimodal Machine Translation. Computation and Language [cs.CL]. Université du Maine, 2019. English. ⟨NNT : 2019LEMA1016⟩. ⟨tel-02309868⟩

Share

Metrics

Record views

224

Files downloads

112