Skip to Main content Skip to Navigation

Extraction lexicale bilingue à partir de textes médicaux comparables : application à la recherche d'information translangue

Abstract : In recent years, with a rapid expansion of online information available on medical web sites in different languages, one of the issues that have to be addressed is that of the access and the processing of this online information. It generally assumes that large, multilingual lexical resources are available for each language pair. How to update these multilingual resources becomes an important clue, especially in a rapidly evolving domain such as medicine. This thesis focuses on domain-specific bilingual lexicon extraction from online medical texts. Our goal is to develop a translation method for bilingual lexicon acquisition from comparable corpora and for query translation in cross-language information retrieval (CLIR). We present here a novel approach based on words distribution symmetry. Traditional approaches to bilingual lexicon extraction from comparable corpora are based on the assumption that words that are translations of each other will have similar distributional profiles across languages. However, they proposed one direction extraction, only from the source to the target language. The basic intuition of the symmetrical distribution is that the reciprocal distribution similarity between two words of different languages is an effective criterion for identifying the translational affinity between words. On the one hand, we evaluated our model for a French-English medical lexicon extraction. On the other hand, the extracted lexicon is used for query translation and expansion in CLIR. The results show that our approach exploring symmetrical distribution performs better than the traditional approach to bilingual lexicon extraction. For query translation and expansion tasks, our model improves the retrieval results only in a semi-supervised mode when compared with the dictionary-based method.
Complete list of metadata
Contributor : Yun-Chuang Chiao Connect in order to contact the contributor
Submitted on : Thursday, December 9, 2004 - 10:03:14 PM
Last modification on : Wednesday, December 9, 2020 - 3:04:58 PM
Long-term archiving on: : Friday, April 2, 2010 - 8:58:17 PM


  • HAL Id : tel-00007704, version 1


Yun-Chuang Chiao. Extraction lexicale bilingue à partir de textes médicaux comparables : application à la recherche d'information translangue. Sciences du Vivant [q-bio]. Université Pierre et Marie Curie - Paris VI, 2004. Français. ⟨tel-00007704⟩



Record views


Files downloads