Skip to Main content Skip to Navigation

Extraction et recherche d'information en langage naturel dans les documents semi-structurés

Abstract : Information retrieval in semi-structured (practically written in XML)
mixes aspects of traditional information retrieval and of database
querying. The structure is very important, but the information
need is vague. The retrieval unit can have different sizes (a
paragraph, a figure, an entire article\dots). Furthermore, XML
flexibility may create some breaks in the natural flow of the

Problems raised at this level are many, notably for document content
analysis and querying. We studied the specific solutions that could
bring the natural language processing (NLP) techniques. We proposed
a theoretical frame and a practical approach to allow the use of
traditional textual analysis techniques in XML documents, disregarding
the structure. We also conceived an interface for querying XML documents
in natural language, and proposed methods using the structure in order
to improve the retrieval of relevant elements.
Document type :
Domain :
Complete list of metadata

Cited literature [220 references]  Display  Hide  Download
Contributor : Xavier Tannier <>
Submitted on : Thursday, December 21, 2006 - 4:15:38 PM
Last modification on : Wednesday, June 24, 2020 - 4:18:07 PM
Long-term archiving on: : Tuesday, April 6, 2010 - 8:28:36 PM


  • HAL Id : tel-00121721, version 1


Xavier Tannier. Extraction et recherche d'information en langage naturel dans les documents semi-structurés. domain_stic.othe. Ecole Nationale Supérieure des Mines de Saint-Etienne, 2006. Français. ⟨tel-00121721⟩



Record views


Files downloads