Skip to Main content Skip to Navigation
Theses

Calcul de centralité et identification de structures de communautés dans les graphes de documents

Nacim Fateh Chikhi 1
1 IRIT-MELODI - MEthodes et ingénierie des Langues, des Ontologies et du DIscours
IRIT - Institut de recherche en informatique de Toulouse
Abstract : In this thesis, we are interested in characterizing large collections of documents (using the links between them) in order to facilitate their use and exploitation by humans or by software tools. Initially, we addressed the problem of centrality computation in document graphs. We described existing centrality algorithms by focusing on the TKC (Tightly Knit Community) problem which affects most existing centrality measures. Then, we proposed three new centrality algorithms (MHITS, NHITS and DocRank) which tackle the TKC effect. The proposed algorithms were evaluated and compared to existing approaches using several graphs and evaluation measures. In a second step, we investigated the problem of document clustering. Specifically, we considered this clustering as a task of community structure identification (CSI) in document graphs. We described the existing CSI approaches by distinguishing those based on a generative model from the algorithmic or traditional ones. Then, we proposed a generative model (SPCE) based on smoothing and on an appropriate initialization for CSI in sparse graphs. The SPCE model was evaluated and validated by comparing it to other CSI approaches. Finally, we showed that the SPCE model can be extended to take into account simultaneously the links and content of documents.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00619177
Contributor : Nacim Fateh Chikhi <>
Submitted on : Monday, September 5, 2011 - 4:48:29 PM
Last modification on : Sunday, June 14, 2020 - 3:28:59 AM
Long-term archiving on: : Tuesday, December 6, 2011 - 2:26:46 AM

Identifiers

  • HAL Id : tel-00619177, version 1

Citation

Nacim Fateh Chikhi. Calcul de centralité et identification de structures de communautés dans les graphes de documents. Interface homme-machine [cs.HC]. Université Paul Sabatier - Toulouse III, 2010. Français. ⟨tel-00619177⟩

Share

Metrics

Record views

417

Files downloads

10400