Skip to Main content Skip to Navigation
Theses

Compression automatique ou semi-automatique de textes par élagage des constituants effaçables : une approche interactive et indépendante des corpus

Abstract : This research belongs to the Natural Language Processing field and more specifically focuses on text summarization.
The originality of this thesis leads in tackling a type of summarization that has not been studied much, text compression using an unsupervised method.
This work presents an interactive and incremental system for syntagmatic tree pruning, while preserving the syntactic coherence and the main informational contents.
On the theoretical side, this work is based on the Government and Biding theory of Noam Chomsky and more precisely on the formal representation of the X-bar theory, to aims at a strong foundation for a computational model compatible with syntactic compression of sentences.
This work led to an operational software, named COLIN, which proposes two modalities: an automated compression and an assistance to summarization in a semi-automated form, directed through a tight interaction with the user.
This software has been evaluated thanks to a quite complex protocol using 25 volunteers.
Experiment results show that 1) the notion of reference abstract which is the basic of classical evaluation is at least questionable, 2) semi-automated compression has been given a high value by users 3) fully automated compressions also get honourable satisfaction levels.
With a compression ratio of over 40% for all genres of text, COLIN offers an appreciable support as an assistance to text compression, without resorting on a learning corpus, and with a user-friendly interface.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00185367
Contributor : Mehdi Yousfi-Monod <>
Submitted on : Monday, June 2, 2008 - 3:32:04 PM
Last modification on : Thursday, May 24, 2018 - 3:59:20 PM
Long-term archiving on: : Friday, November 25, 2016 - 9:53:09 PM

Identifiers

  • HAL Id : tel-00185367, version 3

Collections

Citation

Mehdi Yousfi-Monod. Compression automatique ou semi-automatique de textes par élagage des constituants effaçables : une approche interactive et indépendante des corpus. Informatique [cs]. Université Montpellier II - Sciences et Techniques du Languedoc, 2007. Français. ⟨tel-00185367v3⟩

Share

Metrics

Record views

360

Files downloads

1845