Skip to Main content Skip to Navigation

Indexation aléatoire et similarité inter-phrases appliquées au résumé automatique

Abstract : With the growing mass of textual data on the Web, automatic summarization of topic-oriented collections of documents has become an important research field of Natural Language Processing. The experiments described in this thesis were framed within this context. Evaluating the semantic similarity between sentences is central to our work and we based our approach on distributional similarity and vector representation of terms, with Wikipedia as a reference corpus. We proposed several similarity measures which were evaluated and compared on different data sets: the SemEval 2014 challenge corpus for the English language and own built datasets for French. The good performance showed by our measures led us to use them in a multi-document summary task, which implements a pagerank-type algorithm. The system was evaluated on the DUC 2007 datasets for English and RPM2 corpus for French. This simple approach, based on a resource readily available in many languages, proved efficient, robust and the encouraging outcomes open up real prospects of improvement.
Complete list of metadata

Cited literature [126 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, June 30, 2016 - 10:10:07 AM
Last modification on : Friday, September 25, 2020 - 3:36:03 AM
Long-term archiving on: : Saturday, October 1, 2016 - 10:42:16 AM


Version validated by the jury (STAR)


  • HAL Id : tel-01339872, version 1



Hai Hieu Vu. Indexation aléatoire et similarité inter-phrases appliquées au résumé automatique. Traitement du texte et du document. Université de Bretagne Sud, 2016. Français. ⟨NNT : 2016LORIS395⟩. ⟨tel-01339872⟩



Record views


Files downloads