Repérage et typage d'expressions temporelles pour l'annotation sémantique automatique de pages Web - Application au e-tourisme

Abstract : This thesis presents Adetoa, a system designed to automatically locate temporal expressions in Web pages and tag them with semantic annotations, in the field of e-tourism. A detailed linguistic study has revealed that the expression of temporal information in Web tourism pages is complex and has specific properties. A semiotic study of these pages has pointed out that data are organised in various ways, without any regularity. An automatic analysis of their structure is therefore difficult or even sometimes impossible. These analyses have led to the development of a large number of transducers (under Unitex) for the extraction and mark-up tasks. They can be regarded as a generally applicable resource. Other tourist information is also extracted, such as tourist objects and addresses. Linking transducers have been developed to group all the information concerning one tourist destination. An annotation scheme and transformation rules have been developed in order to mark the annotations and to integrate Adetoa in the processing chain of the Eiffel project. The annotation scheme is based on a tourism ontology but is not a direct replica, thus enabling the expressions to be accurately characterized on a linguistic level. The ontology has then been adapted accordingly, so that the information can more easily be included in the corresponding knowledge base. The evaluation of Adetoa, which is detailed in the last chapter, showed satisfying results, both on a theoretical level and for industrial purposes.
Document type :
Theses
Complete list of metadatas

Cited literature [45 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00530785
Contributor : Stéphanie Weiser <>
Submitted on : Friday, October 29, 2010 - 7:48:53 PM
Last modification on : Thursday, July 5, 2018 - 1:26:45 AM
Long-term archiving on : Sunday, January 30, 2011 - 3:08:56 AM

Identifiers

  • HAL Id : tel-00530785, version 1

Collections

Citation

Stéphanie Weiser, Stéphanie Weiser. Repérage et typage d'expressions temporelles pour l'annotation sémantique automatique de pages Web - Application au e-tourisme. Linguistique. Université de Nanterre - Paris X, 2010. Français. ⟨tel-00530785⟩

Share

Metrics

Record views

1030

Files downloads

1234