Skip to Main content Skip to Navigation
Theses

Méthodes pour informatiser les langues et les groupes de langues « peu dotées »

Abstract : In 2004, less than 1 % of the 6800 languages of the world profits from a high level of computerization, including a broad range of services going from text processing to machine translation. This thesis, which focuses on the other languages - the pi-languages - aims at proposing solutions to cure their digital underdevelopment. In a first part, intended to show the complexity of the problem, we present the languages' diversity, the technologies used, as well as the approaches of the various actors: linguistic populations, software publishers, the United Nations, States... A technique for measuring the computerization degree of a language - the sigma-index - is proposed, as well as several optimization methods. The second part deals with the computerization of the Laotian language and concretely presents the results obtained for this language by applying the methods described previously. The described achievements contributed to improve the sigma-index of the Laotian language by approximately 4 points, this index being currently evaluated with 8.7/20. In the third part, we show that an approach by groups of languages can reduce the computerization costs thanks to the use of a modular architecture associating existing general software and specific complements. For the most language-related parts, complementary generic lingware tools give the populations the possibility to computerize their languages by themselves. We validated this method by applying it to the syllabic segmentation of Southeast Asian languages with unsegmented writings, such as Burmese, Khmer, Laotian and Siamese (Thai).
Document type :
Theses
Complete list of metadatas

Cited literature [159 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00006313
Contributor : Vincent Berment <>
Submitted on : Wednesday, June 23, 2004 - 5:23:28 AM
Last modification on : Friday, November 6, 2020 - 4:09:14 AM
Long-term archiving on: : Friday, April 2, 2010 - 8:19:35 PM

Identifiers

  • HAL Id : tel-00006313, version 1

Collections

UJF | CNRS | IMAG | UGA

Citation

Vincent Berment. Méthodes pour informatiser les langues et les groupes de langues « peu dotées ». Autre [cs.OH]. Université Joseph-Fourier - Grenoble I, 2004. Français. ⟨tel-00006313⟩

Share

Metrics

Record views

786

Files downloads

5070