Skip to Main content Skip to Navigation

Towards Accurate and Scalable Recommender Systems

Abstract : Recommender Systems aim at pre-selecting and presenting first the information in which users may be interested. This has raised the attention of the e-commerce, where the interests of users are analysed in order to predict future interests and to personalize the offers (a.k.a. items). Recommender systems exploit the current preferences of users and the features of items/users in order to predict their future preference in items.Although they demonstrate accuracy in many domains, these systems still face great challenges for both academia and industry: they require distributed techniques to deal with a huge volume of data, they aim to exploit very heterogeneous data, and they suffer from cold-start, situation in which the system has not (enough) information about (new) users/items to provide accurate recommendations. Among popular techniques, Matrix Factorization has demonstrated high accurate predictions and scalability to parallelize the analysis among multiple machines. However, it has two main drawbacks: (1) difficulty of integrating external heterogeneous data such as items' features, and (2) the cold-start issue. The objective of this thesis is to answer to many challenges in the field of recommender systems: (1) recommendation techniques deal with complex analysis and a huge volume of data; in order to alleviate the time consumption of analysis, these techniques need to parallelize the process among multiple machines, (2) collaborative filtering techniques do not naturally take into account the items' descriptions in the recommendation, although this information may help to perform more accurate recommendations, (3) users' and items' descriptions in very large dataset contexts can become large and memory-consuming; this makes data analysis more complex, and (4) the new user cold-start is particularly important to perform new users' recommendations and to assure new users fidelity. Our contributions to this area are given by four aspects: (1) we improve the distribution of a matrix factorization recommendation algorithm in order to achieve better scalability, (2) we enhance recommendations performed by matrix factorization by studying the implicit interest of the users in the attributes of the items, (3) we propose an accurate and low-space binary vector based on Bloom Filters for representing users/items through a high quantity of features in low memory-consumption, and (4) we cope with the new user cold-start in collaborative filtering by using active learning techniques. The experimentation phase uses the publicly available MovieLens dataset and IMDb database, what allows to perform fair comparisons to the state of the art. Our contributions demonstrate their performance in terms of accuracy and efficiency.
Complete list of metadata

Cited literature [174 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Tuesday, November 13, 2018 - 1:01:20 AM
Last modification on : Saturday, December 21, 2019 - 3:49:03 AM
Long-term archiving on: : Thursday, February 14, 2019 - 1:11:49 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01920124, version 1



Manuel Pozo. Towards Accurate and Scalable Recommender Systems. Information Retrieval [cs.IR]. Conservatoire national des arts et metiers - CNAM, 2016. English. ⟨NNT : 2016CNAM1061⟩. ⟨tel-01920124⟩



Record views


Files downloads