Skip to Main content Skip to Navigation

Information diffusion, information and knowledge extraction from social networks

Thi Bich Ngoc Hoang 1
1 IRIT-SIG - Systèmes d’Informations Généralisées
IRIT - Institut de recherche en informatique de Toulouse
Abstract : The popularity of online social networks has rapidly increased over the last decade. According to Statista, approximated 2 billion users used social networks in January 2018 and this number is still expected to grow in the next years. While serving its primary purpose of connecting people, social networks also play a major role in successfully connecting marketers with customers, famous people with their supporters, need-help people with willing-help people. The success of online social networks mainly relies on the information the messages carry as well as the spread speed in social networks. Our research aims at modeling the message diffusion, extracting and representing information and knowledge from messages on social networks. Our first contribution is a model to predict the diffusion of information on social networks. More precisely, we predict whether a tweet is going to be diffused or not and the level of the diffusion. Our model is based on three types of features: user-based, time-based and content-based features. Being evaluated on various collections corresponding to dozen millions of tweets, our model significantly improves the effectiveness (F-measure) compared to the state-of-the-art, both when predicting if a tweet is going to be retweeted or not, and when predicting the level of retweet. The second contribution of this thesis is to provide an approach to extract information from microblogs. While several pieces of important information are included in a message about an event such as location, time, related entities, we focus on location which is vital for several applications, especially geo-spatial applications and applications linked to events. We proposed different combinations of various existing methods to extract locations in tweets targeting either recall-oriented or precision-oriented applications. We also defined a model to predict whether a tweet contains a location or not. We showed that the precision of location extraction tools on the tweets we predict to contain a location is significantly improved as compared when extracted from all the tweets.Our last contribution presents a knowledge base that better represents information from a set of tweets on events. We combined a tweet collection with other Internet resources to build a domain ontology. The knowledge base aims at bringing users a complete picture of events referenced in the tweet collection (we considered the CLEF 2016 festival tweet collection).
Document type :
Complete list of metadata

Cited literature [92 references]  Display  Hide  Download
Contributor : Abes Star :  Contact Connect in order to contact the contributor
Submitted on : Thursday, January 30, 2020 - 12:00:08 PM
Last modification on : Tuesday, October 19, 2021 - 2:23:34 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02460788, version 1


Thi Bich Ngoc Hoang. Information diffusion, information and knowledge extraction from social networks. Information Retrieval [cs.IR]. Université Toulouse le Mirail - Toulouse II, 2018. English. ⟨NNT : 2018TOU20078⟩. ⟨tel-02460788⟩



Record views


Files downloads