Reconnaissance automatique de la parole pour des langues peu dotées

Abstract : Nowadays, computers are heavily used to communicate via text and speech. Text processing tools, electronic dictionaries, and even more advanced systems like text-to-speech or dictation are readily available for several languages. There are however more than 6900 languages in the world and only a small number possess the resources required for implementation of Human Language Technologies (HLT). Thus, HLT are mostly concerned by languages for which large resources are available or which have suddenly become of interest because of the economic or political scene. On the contrary, languages from developing countries or minorities have been less worked on in the past years. One way of improving this "language divide" is do more research on portability of HLT for multilingual applications.
Among HLT, we are particularly interested in Automatic Speech Recognition (ASR). Therefore, we are interested in new techniques and tools for rapid development of ASR systems for under-resourced languages or π-languages when only limited resources are available. These languages are typically spoken in developing countries, but can nevertheless have many speakers. In this work, we investigate Vietnamese and Khmer, which are respectively spoken by 67 million and 13 million people, but for which speech processing services do not exist at all.
Firstly, given the statistical nature of the methods used in ASR, a large amount of resources (vocabularies, text corpora, transcribed speech corpora, phonetic dictionaries) is crucial for building an ASR system for a new language. Concerning text resources, a new methodology for fast text corpora acquisition for π-languages is proposed and applied to Vietnamese and Khmer. Some specific problems in text acquisition and text processing for π-languages such as text normalization, text segmentation, text filtering are resolved. For fast developing of text processing tools for a new π-language, an open source generic toolkit named CLIPS-Text-Tk was developed during this thesis.
Secondly, for acoustic modeling, we address particularly the use of acoustic-phonetic unit similarities for multilingual acoustic models portability to new languages. Notably, an estimation method of the similarity between two phonemes is first proposed. Based on these phoneme similarities, some estimation methods for polyphone similarity and clustered polyphonic model similarity are investigated. For a new language, a source/target acoustic-phonetic unit mapping table can be constructed with these similarity measures. Then, clustered models in the target language are duplicated from the nearest clustered models in the source language and adapted with limited data to the target language. Results obtained for Vietnamese demonstrate the feasibility and efficiency of these methods. The proposal of grapheme-based acoustic modeling, which avoids building a pronunciation dictionary, is also investigated in our work. Finally, our whole methodology is applied to design a Khmer ASR system which leads to 70% word accuracy and which was developed in only five months.
Document type :
Theses
Complete list of metadatas

Cited literature [108 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00081061
Contributor : Viet Bac Le <>
Submitted on : Wednesday, June 21, 2006 - 7:10:44 PM
Last modification on : Thursday, January 11, 2018 - 6:14:32 AM
Long-term archiving on : Monday, April 5, 2010 - 9:29:06 PM

Identifiers

  • HAL Id : tel-00081061, version 1

Collections

UJF | IMAG | UGA

Citation

Viet Bac Le. Reconnaissance automatique de la parole pour des langues peu dotées. Interface homme-machine [cs.HC]. Université Joseph-Fourier - Grenoble I, 2006. Français. ⟨tel-00081061⟩

Share

Metrics

Record views

511

Files downloads

3771