Grammaires locales pour l'analyse automatique de textes : méthodes de construction et outils de gestion

Abstract : Many researchers in the field of Natural Language Processing have shown the significance of descriptive linguistics and especially the use of large-scaled databases of fine-grained linguistic components composed of lexicons and grammars. This approach has a drawback: it requires long-term investment. It is then necessary to develop methods and computational tools to help the construction of such data that are required to be directly applicable to texts. This work focuses on a specific linguistic representation: local grammars that describe precise and local constraints in the form of graphs. Two issues arise: - How to efficiently build precise, complete and text-applicable grammars? - How to deal with their growing number and their dispersion? To handle the first problem, a set of simple and empirical methods have been exposed on the basis of M. Gross (1975)'s lexicon-grammar methodology. The whole process of linguistic analysis and formal representation has been described through the examples of two original phenomena: expressions of measurement (un immeuble d'une hauteur de 20 mètres) and locative prepositional phrases containing geographical proper names (à l'île de la Réunion). Each phenomenon has been narrowed to elementary sentences. This enables semantically classify them according to formal criteria. The syntactical behavior of these sentences has been systematically studied according to the lexical value of their elements. Then, the observed properties have been encoded either directly in the form of graphs with an editor or in the form of syntactical matrices then semi-automatically converted into graphs according to E. Roche (1993). These studies led to develop new conversion algorithms in the case of matrix systems where linguistic information is encoded in several matrices. For the second issue, a prototype on-line library of local grammars have been designed and implemented. The objective is to centralize and distribute local grammars constructed within the RELEX network of laboratories. We developed a set of tools allowing users to both store new graphs and search for graphs according to different criteria. The implementation of a grammar search engine led to an investigation into a new field of information retrieval: searching of linguistic information into sets of local grammars.
Document type :
Theses
Complete list of metadatas

Cited literature [97 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00626252
Contributor : Lingu Ligm <>
Submitted on : Saturday, September 24, 2011 - 12:05:06 PM
Last modification on : Tuesday, June 5, 2018 - 10:10:04 AM
Long-term archiving on : Sunday, December 25, 2011 - 2:20:31 AM

Identifiers

  • HAL Id : tel-00626252, version 1

Citation

Mathieu Constant. Grammaires locales pour l'analyse automatique de textes : méthodes de construction et outils de gestion. Autre [cs.OH]. Université Paris-Est, 2003. Français. ⟨tel-00626252⟩

Share

Metrics

Record views

576

Files downloads

789