Skip to Main content Skip to Navigation

Recommandation diversifiée et distribuée pour les données scientifiques

Maximilien Servajean 1
1 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : In many fields, novel technologies employed in information acquisition and measurement (e.g. phenotyping automated greenhouses) are at the basis of a phenomenal creation of data. In particular, we focus on two real use cases: plants observations in botany and phenotyping data in biology. Our contributions can be, however, generalized to Web data. In addition to their huge volume, data are also distributed. Indeed, each user stores their data in many heterogeneous sites (e.g. personal computers, servers, cloud); yet he wants to be able to share them. In both use cases, collaborative solutions, including distributed search and recommendation techniques, could benefit to the user.Thus, the global objective of this work is to define a set of techniques enabling sharing and discovery of data in heterogeneous distributed environment, through the use of search and recommendation approaches.For this purpose, search and recommendation allow users to be presented sets of results, or recommendations, that are both relevant to the queries submitted by the users and with respect to their profiles. Diversification techniques allow users to receive results with better novelty while avoiding redundant and repetitive content. By introducing a distance between each result presented to the user, diversity enables to return a broader set of relevant items.However, few works exploit profile diversity, which takes into account the users that share each item. In this work, we show that in some scenarios, considering profile diversity enables a consequent increase in results quality: surveys show that in more than 75% of the cases, users would prefer profile diversity to content diversity.Additionally, in order to address the problems related to data distribution among heterogeneous sites, two approaches are possible. First, P2P networks aim at establishing links between peers (nodes of the network): creating in this way an overlay network, where peers directly connected to a given peer p are known as his neighbors. This overlay is used to process queries submitted by each peer. However, in state of the art solutions, the redundancy of the peers in the various neighborhoods limits the capacity of the system to retrieve relevant items on the network, given the queries submitted by the users. In this work, we show that introducing diversity in the computation of the neighborhood, by increasing the coverage, enables a huge gain in terms of quality. By taking into account diversity, each peer in a given neighborhood has indeed, a higher probability to return different results given a keywords query compared to the other peers in the neighborhood. Whenever a query is submitted by a peer, our approach can retrieve up to three times more relevant items than state of the art solutions.The second category of approaches is called multi-site. Generally, in state of the art multi-sites solutions, the sites are homogeneous and consist in big data centers. In our context, we propose an approach enabling sharing among heterogeneous sites, such as small research teams servers, personal computers or big sites in the cloud. A prototype regrouping all contributions have been developed, with two versions addressing each of the use cases considered in this thesis.
Complete list of metadata

Cited literature [209 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, July 10, 2019 - 1:44:08 PM
Last modification on : Thursday, July 11, 2019 - 1:24:15 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02179049, version 1



Maximilien Servajean. Recommandation diversifiée et distribuée pour les données scientifiques. Réseaux sociaux et d'information [cs.SI]. Université Montpellier II - Sciences et Techniques du Languedoc, 2014. Français. ⟨NNT : 2014MON20216⟩. ⟨tel-02179049⟩



Record views


Files downloads