Grammaires locales étendues : principes, mise en œuvre et applications pour l’extraction de l’information

Abstract : Local grammars constitute a descriptive formalism of linguistic phenomena and are commonly represented using directed graphs. Local grammars are used to recognize and extract patterns in a text, but they had some inherent limits in dealing with unexpected variations as well as in their capacity to access exogenous knowledge, in other words information to extract, during the analysis, from external resources and which may be useful to normalize, enhance validate or link the recognized patterns. In this thesis, we introduce the notion of extended local grammar, a formalism capable to extend the classic model of local grammars. The means are twofold: on the one hand, it is achieved by adding arbitrary conditional-functions, called extended functions, which are not predefined in advance and are evaluated from outside of the grammar. On the other hand, it is achieved by allowing the parsing engine to trigger events that can also be processed as extended functions. The work presented herewith is divided into three parts: In the first part, we study the principles regarding the construction of the extended local grammars. Then, we present a proof-of-concept of a corpus-processing tool which implements the proposed formalism. Finally, we study some techniques to extract information from both well-formed and noisy texts. We focus on the coupling of external resources and non-symbolic methods in the construction of our grammars and we highlight the suitability of this approach in order to overcome the inherent limitations of classical local grammars
Document type :
Theses
Complete list of metadatas

Cited literature [123 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01799012
Contributor : Abes Star <>
Submitted on : Thursday, May 24, 2018 - 11:19:07 AM
Last modification on : Monday, February 4, 2019 - 6:08:39 PM
Long-term archiving on : Saturday, August 25, 2018 - 2:00:37 PM

File

TH2017PESC1075.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01799012, version 1

Citation

Cristian Martinez. Grammaires locales étendues : principes, mise en œuvre et applications pour l’extraction de l’information. Traitement du texte et du document. Université Paris-Est, 2017. Français. ⟨NNT : 2017PESC1075⟩. ⟨tel-01799012⟩

Share

Metrics

Record views

254

Files downloads

274