Personalized top-k processing: from centralized to decentralized systems

Xiao Bai 1
1 ASAP - As Scalable As Possible: foundations of large scale dynamic distributed systems
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : The Web 2.0 revolution has transformed the Internet from a read-only infrastructure to an active read-write platform. The rapid increasing amount user-generated content in collaborative tagging systems provides a huge source of information. Yet, performing effective search becomes more challenging, especially when we seek the most appropriate items that match a potentially ambiguous query. Personalization is appealing in this context as it limits the search for the items within a small network of participants with similar interests. However, centralized solutions for this personalization do not scale given the large amount of information that needs to be maintained on a user basis, especially given the dynamic nature of the systems where users continuously change their profiles by tagging new items. In this regard, this thesis deals with the efficiency and scalability of personalized query processing, from centralized to decentralized systems, around two axes: (i) the off-line personalization that relies on users' past tagging behaviors and (ii) the on-line personalization that relies on both the past behaviors and the current query. We first present the algorithm P3K, which decentralizes a state-of-the-art approach and achieves off-line personalized top-k processing in peer-to-peer systems. Then we present P4Q, an extension of P3K that enhances the system performance in terms of storage, bandwidth and robustness. Both P3K and P4Q rely on gossip-based protocols to capture the implicit similarity between users and associate each user with a set of social acquaintances to process the query. Analytical and experimental evaluations convey their scalability and efficiency for top-k query processing, as well as the inherent ability of P4Q to cope with users updating profiles and departing. To further improve the result quality for the queries depicting emerging interests of the querier, we propose a hybrid interest model, taking into account both the tagging profile and the query, to perform personalized query processing. This is achieved on-line in a centralized system by doing top-k twice with the algorithm DT². Then we propose the algorithm DT²P² that efficiently performs the same on-line personalization with improved scalability in a fully decentralized system. Experimental results on real datasets show that on-line personalization is promising to fulfill the diverse user preferences while the proposed algorithms make it feasible in both centralized and decentralized systems.
Document type :
Networking and Internet Architecture [cs.NI]. INSA de Rennes, 2010. English
Contributor : Xiao Bai <>
Submitted on : Friday, December 10, 2010 - 7:03:53 PM
Last modification on : Thursday, May 14, 2015 - 1:10:45 AM


  • HAL Id : tel-00545642, version 1



Xiao Bai. Personalized top-k processing: from centralized to decentralized systems. Networking and Internet Architecture [cs.NI]. INSA de Rennes, 2010. English. <tel-00545642>




Consultation de
la notice


Téléchargement du document