Skip to Main content Skip to Navigation
Theses

Extraction lexicale bilingue à partir de textes médicaux comparables : application à la recherche d'information translangue

Abstract : In recent years, with a rapid expansion of online information available on medical web sites in different languages, one of the issues that have to be addressed is that of the access and the processing of this online information. It generally assumes that large, multilingual lexical resources are available for each language pair. How to update these multilingual resources becomes an important clue, especially in a rapidly evolving domain such as medicine. This thesis focuses on domain-specific bilingual lexicon extraction from online medical texts. Our goal is to develop a translation method for bilingual lexicon acquisition from comparable corpora and for query translation in cross-language information retrieval (CLIR). We present here a novel approach based on words distribution symmetry. Traditional approaches to bilingual lexicon extraction from comparable corpora are based on the assumption that words that are translations of each other will have similar distributional profiles across languages. However, they proposed one direction extraction, only from the source to the target language. The basic intuition of the symmetrical distribution is that the reciprocal distribution similarity between two words of different languages is an effective criterion for identifying the translational affinity between words. On the one hand, we evaluated our model for a French-English medical lexicon extraction. On the other hand, the extracted lexicon is used for query translation and expansion in CLIR. The results show that our approach exploring symmetrical distribution performs better than the traditional approach to bilingual lexicon extraction. For query translation and expansion tasks, our model improves the retrieval results only in a semi-supervised mode when compared with the dictionary-based method.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00007704
Contributor : Yun-Chuang Chiao <>
Submitted on : Thursday, December 9, 2004 - 10:03:14 PM
Last modification on : Friday, May 29, 2020 - 4:02:35 PM
Long-term archiving on: : Friday, April 2, 2010 - 8:58:17 PM

Identifiers

  • HAL Id : tel-00007704, version 1

Citation

Yun-Chuang Chiao. Extraction lexicale bilingue à partir de textes médicaux comparables : application à la recherche d'information translangue. Sciences du Vivant [q-bio]. Université Pierre et Marie Curie - Paris VI, 2004. Français. ⟨tel-00007704⟩

Share

Metrics

Record views

426

Files downloads

1500