Des suites de test pour la TA à un système d'exploitation de corpus alignés de documents et métadocuments multilingues, multiannotés et multimédia

Abstract : The thesis focuses on three major challenges posed by the conception and implementation of an “operating system of translation corpora”, abbreviated as “sectra”. A sectra aims to supply a unified software environment to support the exploitation of translation corpora done by both human and machine. The first challenge aims to the aspect of software environment to support the MT evaluation. The second challenge relates to the aspect of collaborative and contributive support for human work on various corpora in multilingual contexts. Finally, the third challenge aims at software environment to enable the exploitation of translation corpora within innovative applications (like the iMAG Gateways, Notepad++, etc.). Several new notions (such as a multilingualized and contextualized segment, a corpus of multi-file documents, etc.), general principles (pro-activity, delegation of services, etc.), and problems at the conceptual level (for example, the extended definition of a “context” of segment), algorithmic level (for example, programmability of corpora processing), and programmatic level (for example, handling masses of data) have been addressed and dealt with for conceiving and implementing such a system. A system called SECTra_w has been built and experimented successfully in the framework of several real projects of MT evaluation, post-edition, and multilingualization of websites and applications. Keywords: Translation corpora, exploitation of translation corpora, collaborative software environment, MT evaluation, post-edition.
Complete list of metadatas

Cited literature [145 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00548196
Contributor : Cong-Phap Huynh <>
Submitted on : Sunday, December 19, 2010 - 5:23:51 PM
Last modification on : Friday, October 25, 2019 - 2:00:54 AM
Long-term archiving on : Sunday, March 20, 2011 - 2:41:22 AM

Identifiers

  • HAL Id : tel-00548196, version 1

Collections

Citation

Cong-Phap Huynh. Des suites de test pour la TA à un système d'exploitation de corpus alignés de documents et métadocuments multilingues, multiannotés et multimédia. Génie logiciel [cs.SE]. Institut National Polytechnique de Grenoble - INPG, 2010. Français. ⟨tel-00548196⟩

Share

Metrics

Record views

441

Files downloads

1999