Skip to Main content Skip to Navigation
Theses

Extraction et recherche d'information en langage naturel dans les documents semi-structurés

Abstract : Information retrieval in semi-structured (practically written in XML)
mixes aspects of traditional information retrieval and of database
querying. The structure is very important, but the information
need is vague. The retrieval unit can have different sizes (a
paragraph, a figure, an entire article\dots). Furthermore, XML
flexibility may create some breaks in the natural flow of the
text.

Problems raised at this level are many, notably for document content
analysis and querying. We studied the specific solutions that could
bring the natural language processing (NLP) techniques. We proposed
a theoretical frame and a practical approach to allow the use of
traditional textual analysis techniques in XML documents, disregarding
the structure. We also conceived an interface for querying XML documents
in natural language, and proposed methods using the structure in order
to improve the retrieval of relevant elements.
Document type :
Theses
Domain :
Complete list of metadata

Cited literature [220 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00121721
Contributor : Xavier Tannier <>
Submitted on : Thursday, December 21, 2006 - 4:15:38 PM
Last modification on : Wednesday, June 24, 2020 - 4:18:07 PM
Long-term archiving on: : Tuesday, April 6, 2010 - 8:28:36 PM

Identifiers

  • HAL Id : tel-00121721, version 1

Citation

Xavier Tannier. Extraction et recherche d'information en langage naturel dans les documents semi-structurés. domain_stic.othe. Ecole Nationale Supérieure des Mines de Saint-Etienne, 2006. Français. ⟨tel-00121721⟩

Share

Metrics

Record views

614

Files downloads

7473