Usage of non-conventional resources and contributive methods to bridge the terminological gap between languages by developing multilingual "preterminologies"

Abstract : Our motivation is to bridge the terminological gap that grows with the massive production of new concepts (50 daily) in various domains, for which terms are often first coined in some well-resourced language, such as English or French. Finding equivalent terms in different languages is necessary for many applications, such as CLIR and MT. This task is very difficult, especially for some widely used languages such as Arabic, because (1) only a small proportion of new terms is properly recorded by terminologists, and for few languages; (2) specific communities continuously create equivalent terms without normalizing and even recording them (latent terminology); (3) in many cases, no equivalent terms are created, formally or informally (absence of terminology). This thesis proposes to replace the impossible goal of building in a continuous way an up-to-date, complete and high-quality terminology for a large number of languages by that of building a preterminology, using unconventional methods and passive or active contributions by communities of internauts: extracting potential parallel terms not only from parallel or comparable texts, but also from logs of visits to Web sites such as DSR (Digital Silk Road), and from data produced by serious games. A preterminology is a new kind of lexical resource that can be easily constructed and has good coverage. Following a growing trend in computational lexicography and NLP in general, we represent a multilingual preterminology by a graph structure (Multilingual Preterminological Graph, MPG), where nodes bear preterms and arcs simple preterminological relations (monolingual synonymy, translation, generalization, specialization, etc.) that approximate usual terminological (or ontological) relations. A complete System for Eliciting Preterminology (SEpT) has been developed to build and maintain MPGs. Passive approaches have been experimented by developing an MPG for the DSR cultural Web site, and another for the domain of Arabic oneirology: the produced resources achieved good informational and linguistic coverage. The indirect active contribution approach is being tested since 8-9 months using the Arabic instance of the JeuxDeMots serious game.
Document type :
Theses
Computer Science. Université Joseph-Fourier - Grenoble I, 2010. English


https://tel.archives-ouvertes.fr/tel-00583682
Contributor : Mohammad Daoud <>
Submitted on : Wednesday, April 6, 2011 - 12:00:05 PM
Last modification on : Wednesday, April 6, 2011 - 1:35:17 PM

Identifiers

  • HAL Id : tel-00583682, version 1

Collections

Citation

Mohammad Daoud. Usage of non-conventional resources and contributive methods to bridge the terminological gap between languages by developing multilingual "preterminologies". Computer Science. Université Joseph-Fourier - Grenoble I, 2010. English. <tel-00583682>

Export

Share

Metrics

Consultation de
la notice

337

Téléchargement du document

418