Skip to Main content Skip to Navigation
Theses

Modélisation, indexation et recherche de documents structurés

Abstract : Electronic document retrieval systems, either database management systems or information retrieval systems, do not exploit the complete richness of documents. The formers do not extract the semantic from document and are only guided by document structure while the others ones neglect structural aspect and use methods which do not fit to novel features of structured documents. Our goal is to reconcile the different ways to access electronic documents. We also want to provide access to every parts of document which could solve the users information problem. Our work encompasses two steps: the definition of a model of structured document capable to host monomedia and multimedia components (text and still images) and the set up of a structural indexing process suitable for query process. The structured document model is based on three structural relations coming from textual documents: the composition relation, the sequential relation and the reference relation. These relations define the syntactic organisation of document parts, named structural elements. Starting from this organisation, we underscore the dual organisation, the semantic organisation, and we exploit its features to define the descriptor properties on structural elements. We formalize these properties by the notion of attributes scope and by the underlying attributes classification. The scope of attributs point out which structural elements are concerned by which attributes and values. Even if we do not modify the information that describes the document and its parts, we explicit them and we propose a better distribution in which the informations, that is attributes and their values, depend each others. The query process uses these dependecies to give access to relevant document or part of documents. Our prototype, my Personal Daily News, shows the validity of ou work. We provide query mixing structure and content on french daily newspaper. Thanks to our approach, structural elements are made accessible and we also considerably increase the querying flexibility by allowing to the users to only have a partial knowledge of the document structure.
Document type :
Theses
Complete list of metadatas

Cited literature [129 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00004888
Contributor : Thèses Imag <>
Submitted on : Thursday, February 19, 2004 - 2:37:50 PM
Last modification on : Friday, November 6, 2020 - 4:06:04 AM
Long-term archiving on: : Friday, April 2, 2010 - 8:33:37 PM

Identifiers

  • HAL Id : tel-00004888, version 1

Collections

UJF | CNRS | IMAG | UGA

Citation

Franck Fourel. Modélisation, indexation et recherche de documents structurés. Autre [cs.OH]. Université Joseph-Fourier - Grenoble I, 1998. Français. ⟨tel-00004888⟩

Share

Metrics

Record views

494

Files downloads

559