Skip to Main content Skip to Navigation
Theses

Statistical Machine Translation of the Arabic Language

Abstract : The Arabic language received a lot of attention in the machine translation community during the last decade. It is the official language of 25 countries and it is spoken by more than 380 million people. The interest in Arabic language and its dialects increased more after the Arab spring and the political change in the Arab countries. In this thesis, I worked on improving LIUM's machine translation system for Arabic-English in the frame-work of the BOLT project.In this thesis, I have extend LIUM's phrase-based statistical machine translation system in many ways. Phrase-based systems are considered to be one of the best performing approaches. Basically, two probabilistic models are used, a translation model and a language model.I have been working on improving the translation quality. This is done by focusing on three different aspects. The first aspect is reducing the number of unknown words in the translated output. Second, the entities like numbers or dates that can be translated efficiently by some transfer rules. Finally, I have been working on the transliteration of named entities. The second aspect of my work is the adaptation of the translation model to the domain or genre of the translation task.Finally, I have been working on improved language modeling, based on neural network language models, also called continuous space language models. They are used to rescore the n-best translation hypotheses.All the developed techniques have been thoroughly evaluated and I took part in three international evaluations of the BOLT project.
Complete list of metadatas

Cited literature [112 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01316544
Contributor : Abes Star :  Contact
Submitted on : Tuesday, May 17, 2016 - 11:57:27 AM
Last modification on : Tuesday, March 31, 2020 - 3:21:55 PM
Document(s) archivé(s) le : Thursday, August 18, 2016 - 10:26:57 AM

File

2015LEMA1018.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01316544, version 1

Citation

Walid Aransa. Statistical Machine Translation of the Arabic Language. Linguistics. Université du Maine, 2015. English. ⟨NNT : 2015LEMA1018⟩. ⟨tel-01316544⟩

Share

Metrics

Record views

648

Files downloads

2369